![]() |
|||||||||||||
![]() |
QUICK LINKS
|
||||||||||||
|
NEWS & EVENTS Top Stories EAS News Caltech News CACR News Archives |
Upcoming Seminars Friday, March 28, 2008 2:00 PM "SciFlo: Scientific Knowledge Creation on the Grid Using a Semantically-Enabled Dataflow Execution Environment" SciFlo is a system for Scientific Knowledge Creation on the Grid using a Semantically-Enabled Dataflow Execution Environment. SciFlo leverages Simple Object Access Protocol (SOAP) and REST-based Web Services and the Grid Computing standards (WS-* & Globus Alliance toolkits), and enables scientists to do multi-instrument Earth Science by assembling reusable Web Services, python codes, and native executables into a distributed computing flow (tree of operators). The SciFlo client & server engines optimize the execution of such distributed data flows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The scientist injects a distributed computation into the Grid by simply filling out an HTML form or visually authoring the underlying XML dataflow document, and results are returned directly to the scientist's desktop. Once an analysis has been specified for a chunk or day of data, it can be easily repeated with different control parameters or over months to years of data. SciFlo was developed to enable large-scale, multi-instrument atmospheric science using multi-terabyte datasets from NASA’s Earth Observing System (EOS) sensors, namely AIRS, MODIS, MISR, and GPS. To support atmospheric and aerosol science, SciFlo deploys a variety of reusable web services and/or operators for data query, data access, parameter and space/time subsetting, data mining, data fusion, and custom analysis. By leveraging the capabilities of biopython, we plan to reapply SciFlo’s Grid Workflow capabilities to the bioinformatics realm. In the talk, we will discuss the design of SciFlo, differentiate it from other grid workflow offerings, and demonstrate “live” its capabilities, including: visual programming in a browser by simply laying out the flowchart, “dead simple” declarative (XML) dataflow documents, heavy use of XML datatyping & some semantic web technologies, our parallel dataflow execution engine, automatic type/format conversions during the workflow, space/time data query services, data access simply by naming objects, and auto-distribution of operator/code bundles. Friday January 25, 2008, 3:00 pm "Two Open Computational Problems in Biology" The number and variety of large-scale biological data sets is ever increasing, bringing new computational questions and problems with them. An especially fruitful area of biological inquiry has been engendered by the massive amount of sequencing data being produced by several new technologies. I'll discuss two specific biological approaches fostered by new technology: the use of comparative genomics to find and understand functional parts of genomes, and the use of microbial Friday, November 16, 2:00PM "Astroinformatics and Petascale Mining of Large Astronomy Sky Survey Databases" I will describe the new data-intensive research paradigm that astronomy and astrophysics is now entering. This is described within the context of the largest data-producing astronomy project in the coming decade -- the LSST (Large Synoptic Survey Telescope). The enormous data output, database contents, knowledge discovery, and community science expected from this project will impose massive data challenges on the astronomical research community. One of these challenge areas is the rapid machine learning, data mining, and classification of all novel astronomical events from each 3-gigapixel (6-GB) image obtained every 20 seconds throughout every night for the project duration of 10 years. LSST may produce as many as 100,000 such events each and every night. I will describe the status of the LSST project, as well as the emerging data mining research opportunities within the project. This interdisciplinary research program spans many disciplines: astronomy, machine learning (data mining), XLDB (extremely large databases), scientific visualization, computational science, and science education. The latter includes Astroinformatics: the formalization of data-intensive astronomy for research and education. (IST Seminar) Tuesday Sept 18, 12PM "Subspectral Algorithms for Sparse Learning, Optimization & Inference" Baback Moghaddam I will present a general class of "subspectral" algorithms (sparse eigenvector techniques) for solving NP-hard combinatorial optimization problems in three applied domains: (Un)Supervised Learning ( e.g. PCA & LDA), Quadratic/Entropic Optimization (e.g. Least-Squares & MaxEnt) and 3) Bayesian Inference (e.g. Automatic Relevance Determination). Efficient algorithms for both optimal and approximate greedy solutions are derived using analytic eigenvalue bounds. Sample applications presented include "sparse PCA" for variable selection (in statistics), "sparse LDA" for classification (gene discovery), sparse kernel regression (robotics & control), sparse quadratic programming (portfolio optimization), graph model selection (sensor networks) and sparse Bayesian inference for computer vision (face recognition & OCR). Tuesday, August 21, 3:00PM "High-Performance Computing and Feature Animation" Entertainment has an ever-increasing appetite for high-performance computing. From the dazzling visual effects and compositing magic seen in live-action films to the virtual worlds of feature animation, computer graphics for entertainment requires significant computing power. In this talk we look at the computing requirements for high-end feature animation in particular and survey the applications from rendering to physical simulation that are the most demanding. Examples will be drawn from recent DreamWorks releases including Shrek the Third. Friday, April 20, 11:00 AM "Overcoming Obstacles with Graph Based Programming Models" Traditional approaches to implementing scalable applications are based on synchronous parallelism, and divide an application into distinct phases of communication and computation. I'll discuss an alternative approach based on graph-based execution, which treats data motion and computation as coupled, simultaneous activities. This model is more closely matched to applications modeling asynchronous processes or that employ asynchronous execution, for example, to overlap communication with computation. Experiences with the Thyme and Tarragon run-time graph-based substrates reveal their ability to simplify the design of asynchronous algorithms, reduce data transfer overheads, or both. I'll also discuss some details of the APIs and the underlying run time services, and discuss extensions of the model to GPU hardware. Monday March 12, 2:00PM "StatPatternRecognition: A C++ Package for Multivariate Classification" Ilya Narsky, Caltech High Energy Physics SPR implements various tools for supervised learning such as boosting, bagging, random forest, neural networks, decision trees, bump hunter (PRIM), and others. It is a standalone package with an optional dependency on Root for data input/output. SPR was crafted for needs of the HEP community and is now being used by several HEP collaborations, as well as by a few non-HEP users. It is distributed under GPL off Sourceforge: http://sourceforge.net/projects/statpatrec/ . More info on the project is available from http://www.hep.caltech.edu/~narsky/spr.html . Tuesday February 27, 2007. 2:00PM
"User-oriented supercomputing or how to increase language usability without sacrificing performance." Professor Nikolay N. Mirenkov Department of Computer Software University of Aizu, Aizu-Wakamatsu, Japan A new programming paradigm oriented to application programmers/ researchers will be presented. It is based on "filmification" of computational methods and on an environment supporting the development of self- explanatory software components. Algorithmic cyberFilm is an abstraction combining both mathematical and physical concepts. It is a set of multimedia frames representing a variety of algorithmic features. These features are a basis for bridging the gap between "syntax and semantics" and understanding the corresponding component meaning. The self-explanatory concept is also an abstraction; however, it is intuitively much more understandable and allows employing a number of "fuzzy" views to represent the accurate meaning. CyberFilms as pieces of "active" knowledge are acquired in a film database. The cyberFilm frames are watchable and editable in a non-linear order according to the user's demands. Examples of algorithmic cyberFilms, as well as their compactness and understandability for users will be presented. Basic features of a new type of supercompilers and how to save performance of the very-high language constructs will also be considered.
|
||||||||||||
![]() |
|||||||||||||
| [ research | cse@cit | computing resources | publications | about us | news/events | contact/visit | internal ] | |||||||||||||