Caltech Center for Advanced Computing Research » Posts for tag 'clusters'

SHC Software Stack Upgrade – Update

Important Information for SHC Users

As of Sept 8, 2009, SHC has been transitioned to the new sw stack (RHEL+OpenIB). There are currently 115 core4 nodes and 65 core8 nodes, in production. For more information, please visit the SHC Getting Started / System Guide.

SHC Software Stack Upgrade

Important Information for SHC Users

Over the next couple of days, more backend nodes from shc-a will be transitioned to shc-[c,new]’s cluster of backend nodes, running the new software stack. By Sept 4, there will be just 24 shc-a backend nodes, all the rest of the compute nodes will be running the new software stack, seen from shc-[new,c].

  • Please port your codes to the new software environment if you’ve not already done so!
  • Please report any porting problems you’re having; we’ll help asap.
  • Details on how to rebuild your code for the new SHC environment can be found here
  • Your MPI based code must be rebuilt for the new and improved shc software stack.

Preventive Maintenance on Sept 8 from 0800 to 1400 will encompass testing the complete transition of SHC compute and head node resources to the upgraded software stack environment. The fully upgraded production SHC cluster configuration will be two head nodes (shc-[a,b]) and 1180 Opteron compute node cores (163 dual cpu/dual core + 66 dual cpu/quad core).

Questions or concerns about the upgrade? Just let us know.

SHC Cluster Expansion

CACR’s 163 node Shared Heterogeneous Cluster (SHC) has recently expanded by an additional 20 nodes. Each of these new nodes contains 16 GB of memory and have two quad-core, 2.5 GHz AMD Opteron Processors (model 2380). As with the existing SHC nodes, each of the new nodes is connected via Infiniband to CACR’s Infiniband Switch.

The SHC provides computing capabilities specifically configured to meet the needs of applications from Caltech’s PSAAP, Turbulent Mixing, Applied and Computational Mathematics, and Numerical Relativity communities. For more information about the SHC, including information for test users of the new nodes, see this page.

CACR’s Shared Heterogeneous Cluster (SHC) Now Online

The nature of financial support for high-end computing resources has evolved given the widespread adoption of Beowulf clusters. Research groups that need computing often obtain funds for clusters as part of their grants. CACR participates in some of these efforts, and supports significant dedicated resources for high-energy physics, astronomy, geophysics, physics-based simulation, and others. Unfortunately, the balkanization of computation by this model has created inefficiencies. The clusters do not take advantage of economies of scale, can be underutilized, and poorly administered. CACR has developed a shared cluster model, and Professors Paul Dimotakis, Dan Meiron, and Kip Thorne have agreed to be pioneer partners in this effort. CACR has purchased a machine optimized for parallel numerical codes that can sustain over 1 trillion floating point operations per second. It consists of 352 2.2Ghz AMD Opteron cores, 700+ Gigabytes of memory, all interconnected by an Infiniband networking fabric that can move 160+ Gigabytes/s between the compute nodes. The cluster is administered by CACR with funds from the partner groups, and each group has an allocation of time on the machine proportionate to its contribution. By sharing, the groups get better pricing from vendors, professional systems administration by experienced CACR staff, and the ability to use a much larger machine than each group could afford separately. Some of the partners are also supporting efforts at CACR in visualization and code tuning. The shared cluster model is extremely scalable, and CACR is interested in expanding the machine to increase simulation capability and add support for data intensive science. Please contact CACR’s Executive Director, Mark Stalzer (stalzer at caltech.edu) for more information.