Caltech Center for Advanced Computing Research » Page 'Information for SHC Users Transitioning to the Rhel SHC Environment (August 2009)'

Information for SHC Users Transitioning to the Rhel SHC Environment (August 2009)

Any questions, problems, or feedback, please contact us at :

This documentation is intended for SHC users who have accounts on the shc[-a,b].cacr.caltech.edu cluster.

The software stack on SHC is being upgraded. The Linux distribution is converting from SLES to RHEL. The message passing libraries will be based on OpenIB and OpenMPI v1.3.3. MPI based applications built on shc-[a,b] will *not* run in the new SHC environment. You must rebuild your MPI based applications. Below are details for transitioning from the shc-[a,b] SLES environment to the improved SHC RHEL based, software stack.

The RedHat sub-cluster to which you will be transitioning has its own head node (shc-c.cacr.caltech.edu).

The number of compute nodes will increase from its current count of 117 nodes to a total of 229 nodes by September. This will be  accomplished by transitioning more nodes from the SLES shc-[ab] cluster to the RHE shc-c cluster.

Each user with an account on the SLES cluster also has an account on the RHE shc-c cluster. Your home directory on the RHE cluster only has the following files/directories: A copy of your .ssh directory from the SLES cluster (so you should be able to login); and a symlink (”SLES”) which points to your SLES home directory. This link to the SLES home directory enables you to move (or copy)  files that you need for running on the RedHat nodes into your new RHE home directory.

You will probably want to copy your .bashrc, .bash_profile, .tcshrc, .login (and any other files that you will need) into your new RHE home directory as these have not been copied into the RHE  directory by us. Please note that you may need to make minor tweaks appropriate to the RedHat world to these files after copying.

The familar ‘pkgs’ command is on shc-c, reflecting the new sw stack, easing your rebuild experience. The MPI wrappers (e.g. mpicc) should take care of most, if not all, changes necessary to build your MPI code in the new shc-c environment.

Changes to the way job submissions are done on the shc-c cluster of RedHat nodes, compared to how job submission works on shc-[a,b], are slight, but significant.

  • Short ( < 2 hour) jobs are given scheduling priority on 6 “core4″ nodes Mon-Fri, 8am to 6pm.
  • There is no “weekend” queue on shc-c
  • Available queues on shc-c are “productionQ”, “weekdayQ”, “weekendQ”, and “dedicatedQ”. The philosophy, though, is to generally not have to specify a queue. Specify instead, the resources you need and your job will be routed to the proper queue.
  • To accommodate dual processor, quad core nodes as well as dual processor, dual core nodes in the shc environment, resource specification has changed a bit, compared to shc-[a,b]’s environment.
-l nodes=NN[:<type>][:ppnval][+MM[:<type>][:ppnval]]...
NN, MM Number of nodes of the specified type

required by the job; default is 1.

type Node type: “core8″ for eight-core nodes

and “core4″ for four-core nodes;

default value is “core4″.

ppnval Number of processes that will be run on

each node, specified as “ppn=K”, where

“K” is a value not exceeding the core

count; default value is the number of

cores the allocated nodes have.

Jobs submitted with a runtime of > 12 hours will automatically be routed to a weekendQ.

dedicatedQ runs need special approval

WeekdayQ runs are M-F, 0800 to 1700

WeekendQ runs are F-M, 1700 to 0800. Caltech holidays extend the active window for the WeekendQ

Job submission on shc-c examples:

A standard, single-node, 4 way MPI executable which needs to run

for 30 minutes on a four-core node:

 qsub -l walltime=00:30:00 -l nodes=1:core4:ppn=4 jobA
 qsub -l walltime=00:30:00 -l nodes=1:core4 jobA
 qsub jobA

*all of the above are equivalent

Equivalent ways to submit a six-node, 48 way, jobB which needs to

run for 1.5 hours on eight-core nodes:

 qsub -l walltime=1:30:00 -l nodes=6:core8:ppn=8 -q productionQ jobB
 qsub -l walltime=1:30:00,nodes=6:core8 jobB

Equivalent way to submit a job that should run on a mixture of

node types (three eight-core nodes plus six four-core nodes)

for four hours:

 qsub -l walltime=4:00:00 -l nodes=3:core8+6:core4 jobC
 qsub -l walltime=4:00:00,nodes=3:core8+6:core4 jobC