Hello and welcome to the CACR website.

CACR Seminar: Anthony Goldbloom, Kaggle

Anthony Goldbloom
Kaggle (http://www.kaggle.com/)

Thursday January 6, 2011
2:30 PM
100 Powell-Booth

Abstract

Machine learning and data prediction is crucial to most organizations. Banks predict which loan applicants are likely to default, treasuries forecast tax revenues and medical researchers predict the likelihood of illness from gene sequences.

Crowdsourced data mining can lead to vastly better models. My project, Kaggle, recently hosted a bioinformatics contest, which required participants to pick markers in a series of genetic sequences that predict the progression of HIV. Within a week and a half, the best submission had already outdone the best methods in the scientific literature.

This result neatly illustrates the strength of competitions. Whereas the scientific literature or in-house models tend to evolve slowly (somebody tries something, somebody else tweaks that approach and so on), a competition inspires rapid innovation by introducing the problem to a wide audience. There are an infinite number of approaches that can be applied to any machine learning problem and it is impossible to know at the outset which technique will be most effective.

Bio

Anthony is the Founder and CEO of Kaggle, a global platform for data prediction competitions. In addition to founding Kaggle, Anthony continues to consult to hosts of Kaggle competitions to help them frame prediction tasks, to get the best out of the new platform and help them integrate insights into their day-to-day operations.

Before Kaggle, Anthony was a macroeconomic modeler for the Reserve Bank of Australia and before that the Australian Treasury. In these roles, Anthony built and maintained macroeconomic models of Australia’s economy to improve forecasting and model the economic effect of changes in policy parameters, such as interest rates and fiscal policy.

Anthony graduated with first class honours in econometrics at the University of Melbourne and has published in The Economist magazine and the Australian Economic Review.

IST Seminar – Mark Stalzer, CACR

IST Lunch Bunch
Tuesday November 2, 2010
12:00 PM
105 Annenberg

A (Hypothetical) Data to Discovery Engine
Mark Stalzer, Caltech – CACR

Description/Abstract:
Moore’s law works for semiconductor-based detectors and there is an increasing flood of data being generated in astronomy, high energy physics, biology, and other sciences. Computation is essential for both (1) making predictions from theory and (2) the analysis of data from experiments. Is there a way to architecturally balance both needs in a high performance computing (HPC) system?

HPC systems are typically constructed from easily available parts, just organized differently and scaled to process extreme workloads. The most power efficient petascale machine as of 2010 is Roadrunner at the Los Alamos National Laboratory. Another interesting machine is the Apple iPad which uses very low power System on a Chip/Package on Package technologies. This talk explores the question of “what happens when you cross Roadrunner with iPads”? The result is a high level of integration between computation and storage on a single server blade, called a Flashblade, with 100x-1,000x performance improvements for some data-centric applications. The Flashblade architecture, expected performance, programming, and scaling with advancing technology are discussed.

(See publication page for PDF link)

Astronomy Colloquium: CRTS: An Open Optical Transient Survey

CRTS: An Open Optical Transient Survey
Date: Wednesday October 6, 2010 4:00 PM – 5:00 PM
Location: Cahill Center, Hameetman Auditorium

Andrew Drake, computational scientist, Caltech Center for Advanced Computing Research

The Catalina Real-time Transient Survey (CRTS) is a Caltech operated optical transient survey that covers most of the Northern and Southern sky in search of transient astrophysical phenomena occurring on timescales of minutes to years. The project uses data from the Catalina Sky Survey NEO search and began real-time discovery and publication of transient events in November 2007. CRTS has found thousands of sources ranging from UV Ceti and dwarf nova outbursts to supernovae and Blazars. I will discuss the survey, the discoveries made to date, and our efforts to provide immediate open access to CRTS discoveries and historical CSS data.

* For further information: contact Gina Armas gina@its.caltech.edu phone: 4671
For the full scoop, see event web page: http://www.astro.caltech.edu/~gma/colloquia.html.
* Sponsored by: Physics, Math and Astronomy

a Universe of Astronomical Data

e&sArticle in Summer 2010 issue of Engineering & Science Magazine about CACR’s participation in Astroinformatics:

A Universe of Astronomical Data

“After a decade of developing the tools and infrastructure needed to get these databases to talk to each other, the project, now called the Virtual Astronomical Observatory and funded by NASA and the NSF, opened for business in May. “We’re moving onto the operational phase,” says [Matthew] Graham, a member of the program council of the VAO. “The hope is that we can really make an impact on the community.” In addition to Graham, CACR computational scientist Roy Williams also plays a leading role with the VAO.”

New Cluster for Theoretical AstroPhysics Installed

Rendering of a rapidly spinning, gravitational-wave emitting newborn neutron star

Rendering of a rapidly spinning, gravitational-wave emitting newborn neutron star. Simulation: Ott et al. 2007 Rendering: Ralf Kaehler ZIB/AEI/KIPAC 2007

This month CACR has installed and configured a new cluster in the Powell-Booth Laboratory for Computational Science. This system is specifically configured to meet the applications needs of Caltech’s Theoretical AstroPhysics Including Relativity (TAPIR) group in the Physics, Mathematics, and Astronomy Division.

The MRI2 cluster is funded by an NSF MRI-R2 award with matching funds from the Sherman Fairchild Foundation.The configuration, integrated by Hewlett-Packard and CACR’s operations team, consists of 1536 Intel X5650 compute cores in 128 dual Westmere hex-core nodes equipped with a total of ~3 TB of memory, connected via QDR InfiniBand (IB). It includes 100 TB of high-performance, high-reliability disk space access via IB through a Panasas rack.

The research project using the new cluster, Simulating eXtreme Spacetimes: Facilitating LIGO and Enabling Multi-Messenger Astronomy, is led by Professor Christian Ott. The co-Investigators on the MRI award are Dr. Mark Scheel of TAPIR and CACR’s director, Dr. Mark Stalzer. The research will explore the dynamics of spacetime curvature, matter, and radiation at high energies and densities. Central project aspects are the simulation of black hole binary coalescence, neutron-star — black hole inspiral and merger, and the collapse of massive stars leading to core-collapse supernovae or gamma-ray bursts. Key results will be the prediction of gravitational waveforms from these phenomena to enable LIGO gravitational wave searches and to facilitate the extraction of (astro-)physics from observed signals.

The MRI2 cluster is named Zwicky, in honor of Caltech Astrophysics Professor Fritz Zwicky (1898-1974), who discovered supernovae and who was the first to explain how supernovae can be powered by the collapse of a massive star into a neutron star. Zwicky also discovered the first evidence for dark matter in our universe, proposed to use supernovae as standard candles to measure distances in the universe, and suggested that galaxy clusters could act as gravitational lenses.