Caltech Center for Advanced Computing Research » 'Seminar: Clustering of Genome-wide Chromatin Mark Data Using Self-Organizing Maps'

Seminar: Clustering of Genome-wide Chromatin Mark Data Using Self-Organizing Maps

Monday Oct. 31, 2011
10AM
Powell-Booth 100

Clustering of Genome-wide Chromatin Mark Data Using Self-Organizing Maps

Shirley Pepke, Ali Mortazavi, and Barbara Wold

Genome-wide sequence-based assays such as ChIP-seq offer unprecedented opportunities to characterize and predict regulatory sequence regions such as those corresponding to enhancers. The ENCODE Project, in particular, has made available the results of hundreds of high-throughput assays spanning multiple cell types. The amount and complexity of this data make it a rich source for mining of knowledge, however, it also presents significant computational challenges. We have used Self-Organizing Maps (SOMs) as part of a pipeline to integratively analyze ENCODE Tier1 and Tier2 cell type data by performing a fine-grained clustering of genome segments based upon the vector of ChIP-seq signal levels within each segment. Because SOMs generate a topographic mapping of the input data onto a grid of prototype vectors such that the proximity of two vectors on the map indicates their similarity, a key advantage for interpretability is the embedding of higher level relationships ( relationships between clusters) within the maps. The input vectors we use are constructed from experimental data for a large number of histone marks (plus some transcription factors as well as open chromatin assays), thus the SOM prototype vectors underlying the 2D mapping are high-dimensional and some care is required in interpreting the map landscape. Here we look at different techniques for clustering the SOM prototype vectors in order to discriminate visually observed patterns at different levels of detail. We discuss implications of the clustering paradigm for biological interpretability and for determining functional relationships of genomic segments.