CS700:Graduate Seminar in Computer Science & Informatics

Bayesian model-based methods for analyzing ChIP sequencing data

Protein-DNA interaction constitutes a basic mechanism for genetic regulation of target gene expression. Deciphering this mechanism is challenging due to the difficulty in characterizing protein-bound DNA on a genomic scale. The recent arrival of ultra-high throughput sequencing technologies has revolutionized this field by allowing quantitative sequencing analysis of target DNAs in a rapid and cost-effective way. ChIP-Seq, which couples chromatin immunoprecipitation (ChIP) with next-generation sequencing, provides millions of short-read sequences, representing tags of DNAs bound by specific transcription factors and other chromatin-associated proteins. The rapid accumulation of ChIP-Seq data has created a daunting analysis challenge. Here we discuss several interesting problems arise from analyzing ChIP-Seq data, namely peak calling, motif finding and data integration. Solving these problems requires state-of-the-art statistical modeling techniques as well as advanced computational algorithms.
Steve Qin is Associate Professor of Biostatistics and Bioinformatics at Emory. He received a PhD (Statistics) from the University of Michigan in 2000 and a BS from Peking University in 1994. He was a post-doctoral fellow at Harvard from 2000-2003, and an assistant professor at Michigan from 2003-2010. His research includes the application of statistical and informatics methods to study drug interactions, microarray gene expression data sets, and genome.