Hello and welcome to the CACR website.

Seminar: Clustering of Genome-wide Chromatin Mark Data Using Self-Organizing Maps

Monday Oct. 31, 2011
10AM
Powell-Booth 100

Clustering of Genome-wide Chromatin Mark Data Using Self-Organizing Maps

Shirley Pepke, Ali Mortazavi, and Barbara Wold

Genome-wide sequence-based assays such as ChIP-seq offer unprecedented opportunities to characterize and predict regulatory sequence regions such as those corresponding to enhancers. The ENCODE Project, in particular, has made available the results of hundreds of high-throughput assays spanning multiple cell types. The amount and complexity of this data make it a rich source for mining of knowledge, however, it also presents significant computational challenges. We have used Self-Organizing Maps (SOMs) as part of a pipeline to integratively analyze ENCODE Tier1 and Tier2 cell type data by performing a fine-grained clustering of genome segments based upon the vector of ChIP-seq signal levels within each segment. Because SOMs generate a topographic mapping of the input data onto a grid of prototype vectors such that the proximity of two vectors on the map indicates their similarity, a key advantage for interpretability is the embedding of higher level relationships ( relationships between clusters) within the maps. The input vectors we use are constructed from experimental data for a large number of histone marks (plus some transcription factors as well as open chromatin assays), thus the SOM prototype vectors underlying the 2D mapping are high-dimensional and some care is required in interpreting the map landscape. Here we look at different techniques for clustering the SOM prototype vectors in order to discriminate visually observed patterns at different levels of detail. We discuss implications of the clustering paradigm for biological interpretability and for determining functional relationships of genomic segments.

Employment Opportunity at CACR: Computational Scientist

Position Description:

The Center for Advanced Computing Research at the California Institute of Technology is seeking a highly motivated individual to engineer scientific software to make effective use of accelerators, particularly general-purpose graphics processing units (GPGPUs). There are applications in several research groups, including geophysics, solid mechanics, chemistry, and biology. The initial responsibility will be optimizing codes to exploit a large new hybrid (CPU/GPGPU) cluster in the Division of Geological and Planetary Sciences. Specific applications include Bayesian models of fault slip during large earthquakes, inverse models of the Earth’s interior structure, large-scale remotely sensed image processing and models for use in rapid tsunami early warning systems.

Requirements & Qualifications:
  • Engineer scientific codes in multiple disciplines to make effective use of accelerators.
  • Document work and train students and staff in accelerator programming.
  • Serve as a campus-wide subject matter expert on accelerator programming.
  • Collaborate with experimental and scientific teams to deploy scientific software and respond to research challenges.
  • Contribute to the writing of papers and grant proposals.
  • Other duties as requested.
  • B.S. in computer science, physics, applied mathematics or a closely related field.
  • Must have a minimum of 2 years experience programming accelerators for scientific or closely related applications.
  • Thorough knowledge of C/C++, OpenCL, and CUDA.
Applications should be made via the Caltech Job Posting.

Caltech is an equal opportunity/affirmative action employer. Women, minorities, veterans and disabled persons are encouraged to apply.

CACR Seminar: “Modern Time Series Analysis of Three Cycles of Solar Chromospheric Activity”

“Modern Time Series Analysis of Three Cycles of Solar Chromospheric Activity”
Jeff Scargle
NASA Ames; Distinguished Visiting Scholar, Keck Institute for Space Studies

Thursday Oct 6
11AM
100 Powell-Booth

Astronomical programs such as NASA’s Kepler Mission, Caltech’s Catalina Real-Time Transient Survey and Palomar Transient Factory, plus many other all-sky photometric surveys — past, present and future — demand efficient, automatic methods for extracting information from time series data.  I will describe algorithms for standard and novel analysis for:

* any data mode (events, counts in bins, point measurements with errors, etc.)
* time, frequency, and time-frequency domains
* auto- and cross- modes for single and multiple time series

Selected application examples will focus on three and a half decades of data from the NSO/AFRL/Sac Peak K-line monitoring program.  Power spectrum and time-frequency analysis elucidates the solar cycle and an underlying random process, and reveals a new periodicity possibly connected with internal solar MHD activity.

CACR Seminar: Toward the ‘grand unified theory’ of user interface

“Toward the ‘grand unified theory’ of user interface”

Jiao Lin
Computational Scientist, Center for Advanced Computing Research (CACR)

Tuesday Sept 20
11AM
100 Powell-Booth

Abstract: Intuitive, responsive, and clean graphical user interface has become more and more important for scientific software applications. Building graphical user interface is tedious, however. Without extreme care, a user interface application can easily become unnecessarily complex and convoluted, and as a result, unmaintainable. Building web-based graphical user interface is harder due to inconsistent implementations of languages among browsers and multiple languages/standards/platforms that could be involved, and that renders management of a web UI project expensive, and sometimes chaotic. With the emergence of cloud computing, we will see many scientific computing packages turning to cloud and demand web or mobile-device user interfaces, while the traditional desktop user interface still has its large user base. A much simplified route of developing desktop/web/mobile-device user interface is needed. This work looks for the most compact set of abstract concepts and principles enough for constructing sophisticated UI. In practical, it intends to reduce the chaos and agony in building user interface applications, to dramatically lower the barrier of creating good user interfaces, and to make it much easier to maintain and evolve them.

CACR-hosted Theoretical AstroPhysics Cluster expands

The MRI2 cluster, meeting the application needs of Caltech’s Theoretical AstroPhysics Including Relativity (TAPIR) group, has expanded to include an additional 40 compute nodes. The configuration, integrated by Hewlett-Packard and CACR’s Operations team, now consists of 2016 Intel X5650 compute cores, in 168 dual Westmere hex-core nodes, equipped with ~ 4 TB of memory (2 GB/core). The cluster is connected by a 2:1 fat tree QDR InfiniBand network, with high speed access to 80 TB (usable) of high performance Panasas storage and 48 TB of archival storage. For more information about the system, see the CACR Facilities & Operations page.

The MRI2 cluster, called “zwicky”, provides compute and storage resources for research codes investigating core-collapse supernovae, gamma-ray bursts, black holes and neutron stars.