CACR: Caltech's Center for Advanced Computing Research








NEWS & EVENTS  
Top Stories     
EAS News   
Caltech News    
CACR News Archives   
 
 


Upcoming Seminars


Monday November 10, 2008, 1PM
Powell-Booth 100 (Seminar Room)

"Making and using VO tools to study the QSO distribution"

Giuseppe Longo
Department of Physics - University Federico II in Naples, Italy INAF - Napoli

Due to the huge amount of data gathered by the large optical surveys and by a new generation of space borne experiments, astronomy has become a very “data rich” science. The exploitation of the huge amount of information contained in the astronomical archives, which has been federated and made accessible to the community through the International Virtual Observatory (VO) infrastructure, calls for the adoption of Intelligent Data Analysis (IDA) methods and tools which allow to extract patterns and trends in an almost automatic way and can find application in almost all fields of observational astronomy. The talk is therefore divided in two parts: the first one is devoted to describe how and why in the coming decades, IDA methodologies will play an increasing role in astronomy but not only. The second part will discuss the preliminary results of a Euro-VO template “science case”, concerning the evaluation of photometric redshifts for SDSS galaxies and QSO’s and the identification of quasar candidates from combined UKIDS and SDSS data.

 

Monday, Oct. 27, 11:00AM
Moore 080

"Models of interactions of Ca(2+), CaM, and monomeric catalytic subunits of CaMKII: a piece of the post-synaptic signaling network puzzle"
Dr. Shirley Pepke, Caltech Center for Advanced Computing Research

Calcium (Ca) signal transduction is a fundamental driver of synaptic plasticity in neurons. The molecule Calmodulin (CaM) is an important second messenger in Ca signaling in the post-synaptic density, integrating Ca levels via four binding sites. CaM transmits Ca signal information downstream through selective binding to target enzymes such as calmodulin-activated kinase II (CaMKII). Prior models of Ca/CaM/CaMKII have focused on the role of the unique holoenzyme structure of CaMKII in generating sensitivity and selectivity in response to dynamic Ca input. I will present models of Ca/CaM/CaMKII binding and phosphorylation reactions (developed within the Kennedy lab) that incorporate detailed representations of Ca/CaM and Ca/CaM/CaMKII binding states and explore the resulting impact on phosphorylation rates of monomeric catalytic subunits of CaMKII. Ca/CaM state models are seen to be necessary to accurately predict CaMKII phosphorylation levels under the low Ca conditions that are typical in neurons. Additionally, specific kinetic rate ranges in the models are shown to confer frequency sensitivity independent of a CaMKII holoenzyme structure. Sensitivity analysis on the estimated model parameters confirm these findings across sampled ranges of all parameters and point to areas where further experiments are necessary to establish quantitative values. While the results presented will be for numerical integration of an ODE representation of the reaction network, the models are easily implemented within a stochastic simulation framework that allows analysis of the response to Ca inputs with low molecule numbers as well as gradients in both time and space. The models promise new insight into the relative roles of thermodynamics, kinetics, molecular structure, and spatial distributions of signaling proteins in determining the synaptic response to Ca influxes.

Friday, March 28, 2008 2:00 PM
Powell-Booth Seminar Room (PB 100)

"SciFlo: Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment"
Brian Wilson, Jet Propulsion Laboratory

SciFlo is a system for Scientific Knowledge Creation on the Grid using a Semantically-Enabled Dataflow Execution Environment. SciFlo leverages Simple Object Access Protocol (SOAP) and REST-based Web Services and the Grid Computing standards (WS-* & Globus Alliance toolkits), and enables scientists to do multi-instrument Earth Science by assembling reusable Web Services, python codes, and native executables into a distributed computing flow (tree of operators). The SciFlo client & server engines optimize the execution of such distributed data flows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The scientist injects a distributed computation into the Grid by simply filling out an HTML form or visually authoring the underlying XML dataflow document, and results are returned directly to the scientist's desktop. Once an analysis has been specified for a chunk or day of data, it can be easily repeated with different control parameters or over months to years of data.

SciFlo was developed to enable large-scale, multi-instrument atmospheric science using multi-terabyte datasets from NASA’s Earth Observing System (EOS) sensors, namely AIRS, MODIS, MISR, and GPS. To support atmospheric and aerosol science, SciFlo deploys a variety of reusable web services and/or operators for data query, data access, parameter and space/time subsetting, data mining, data fusion, and custom analysis. By leveraging the capabilities of biopython, we plan to reapply SciFlo’s Grid Workflow capabilities to the bioinformatics realm.

In the talk, we will discuss the design of SciFlo, differentiate it from other grid workflow offerings, and demonstrate “live” its capabilities, including: visual programming in a browser by simply laying out the flowchart, “dead simple” declarative (XML) dataflow documents, heavy use of XML datatyping & some semantic web technologies, our parallel dataflow execution engine, automatic type/format conversions during the workflow, space/time data query services, data access simply by naming objects, and auto-distribution of operator/code bundles.

Friday January 25, 2008, 3:00 pm
Powell-Booth 100

"Two Open Computational Problems in Biology"
Dr. C. Titus Brown, Caltech / Michigan State U

The number and variety of large-scale biological data sets is ever increasing, bringing new computational questions and problems with them. An especially fruitful area of biological inquiry has been engendered by the massive amount of sequencing data being produced by several new technologies. I'll discuss two specific biological approaches fostered by new technology: the use of comparative genomics to find and understand functional parts of genomes, and the use of microbial
community sequencing to understand microbial ecosystems. Both of these areas are important areas of research with many computational questions
buried within them.

Friday, November 16, 2:00PM
Powell-Booth 100

"Astroinformatics and Petascale Mining of Large Astronomy Sky Survey Databases"
Kirk Borne, Department of Computational and Data Sciences, George Mason University

I will describe the new data-intensive research paradigm that astronomy and astrophysics is now entering. This is described within the context of the largest data-producing astronomy project in the coming decade -- the LSST (Large Synoptic Survey Telescope). The enormous data output, database contents, knowledge discovery, and community science expected from this project will impose massive data challenges on the astronomical research community. One of these challenge areas is the rapid machine learning, data mining, and classification of all novel astronomical events from each 3-gigapixel (6-GB) image obtained every 20 seconds throughout every night for the project duration of 10 years. LSST may produce as many as 100,000 such events each and every night. I will describe the status of the LSST project, as well as the emerging data mining research opportunities within the project. This interdisciplinary research program spans many disciplines: astronomy, machine learning (data mining), XLDB (extremely large databases), scientific visualization, computational science, and science education. The latter includes Astroinformatics: the formalization of data-intensive astronomy for research and education.

(IST Seminar) Tuesday Sept 18, 12PM
Moore 080

"Subspectral Algorithms for Sparse Learning, Optimization & Inference"

Baback Moghaddam
Machine Learning Group
Jet Propulsion Laboratory/California Institute of Technology

I will present a general class of "subspectral" algorithms (sparse eigenvector techniques) for solving NP-hard combinatorial optimization problems in three applied domains: (Un)Supervised Learning ( e.g. PCA & LDA), Quadratic/Entropic Optimization (e.g. Least-Squares & MaxEnt) and 3) Bayesian Inference (e.g. Automatic Relevance Determination). Efficient algorithms for both optimal and approximate greedy solutions are derived using analytic eigenvalue bounds. Sample applications presented include "sparse PCA" for variable selection (in statistics), "sparse LDA" for classification (gene discovery), sparse kernel regression (robotics & control), sparse quadratic programming (portfolio optimization), graph model selection (sensor networks) and sparse Bayesian inference for computer vision (face recognition & OCR).

Tuesday, August 21, 3:00PM
Powell-Booth 100

"High-Performance Computing and Feature Animation"
Ron Henderson, DreamWorks Animation

Entertainment has an ever-increasing appetite for high-performance computing. From the dazzling visual effects and compositing magic seen in live-action films to the virtual worlds of feature animation, computer graphics for entertainment requires significant computing power. In this talk we look at the computing requirements for high-end feature animation in particular and survey the applications from rendering to physical simulation that are the most demanding. Examples will be drawn from recent DreamWorks releases including Shrek the Third.

Friday, April 20, 11:00 AM
100 Powell-Booth / CACR Seminar Room

"Overcoming Obstacles with Graph Based Programming Models"
Scott B. Baden
Department of Computer Science and Engineering
University of California, San Diego

Traditional approaches to implementing scalable applications are based on synchronous parallelism, and divide an application into distinct phases of communication and computation. I'll discuss an alternative approach based on graph-based execution, which treats data motion and computation as coupled, simultaneous activities. This model is more closely matched to applications modeling asynchronous processes or that employ asynchronous execution, for example, to overlap communication with computation. Experiences with the Thyme and Tarragon run-time graph-based substrates reveal their ability to simplify the design of asynchronous algorithms, reduce data transfer overheads, or both. I'll also discuss some details of the APIs and the underlying run time services, and discuss extensions of the model to GPU hardware.

Monday March 12, 2:00PM
100 Powell-Booth Seminar Room

"StatPatternRecognition: A C++ Package for Multivariate Classification"

Ilya Narsky, Caltech High Energy Physics

SPR implements various tools for supervised learning such as boosting, bagging, random forest, neural networks, decision trees, bump hunter (PRIM), and others. It is a standalone package with an optional dependency on Root for data input/output. SPR was crafted for needs of the HEP community and is now being used by several HEP collaborations, as well as by a few non-HEP users. It is distributed under GPL off Sourceforge: http://sourceforge.net/projects/statpatrec/ . More info on the project is available from http://www.hep.caltech.edu/~narsky/spr.html .

Tuesday February 27, 2007. 2:00PM
100 Powell-Booth Seminar Room

"User-oriented supercomputing or how to increase language usability without sacrificing performance."

Professor Nikolay N. Mirenkov Department of Computer Software University of Aizu, Aizu-Wakamatsu, Japan

A new programming paradigm oriented to application programmers/ researchers will be presented. It is based on "filmification" of computational methods and on an environment supporting the development of self- explanatory software components. Algorithmic cyberFilm is an abstraction combining both mathematical and physical concepts. It is a set of multimedia frames representing a variety of algorithmic features. These features are a basis for bridging the gap between "syntax and semantics" and understanding the corresponding component meaning. The self-explanatory concept is also an abstraction; however, it is intuitively much more understandable and allows employing a number of "fuzzy" views to represent the accurate meaning. CyberFilms as pieces of "active" knowledge are acquired in a film database. The cyberFilm frames are watchable and editable in a non-linear order according to the user's demands. Examples of algorithmic cyberFilms, as well as their compactness and understandability for users will be presented. Basic features of a new type of supercompilers and how to save performance of the very-high language constructs will also be considered.