Hewlett Packard Itanium II Cluster (IT2)
Contact Information:
Status: Dedicated/Restricted
The HP Itanium II cluster provides online access to very large (~50TB) scientific data collections with a modest compute cluster tightly connected. Please see the Technical Summary, below, for details.
Caltech collaborators whose needs are commensurate with capabilites and operational policies of the cluster are encouraged to apply for access.
Technical Summary
|
Component
|
Description
|
| Architecture | IA64 Linux Cluster |
| Access Nodes | dual-processor ECC SDRAM memory: 6 GB/node 1 node |
| Compute Nodes | dual-processor 6 GB ECC SDRAM memory 17 nodes (34 processors) Peak performance 177 Gflops |
| Processor | Intel® Itanium® 2, 1.3 GHz Integrated 3 MB L3 cache |
| Network Interconnect | Myrinet 2000, Gigabit Ethernet, Fiber Channel |
| Disk | 73 GB local scratch/node |
| Operating System | Linux 2.4.21-SMP (SuSE SLES 8.0) |
| Compilers | Intel: Fortran77/90/95 C C++ GNU: Fortran77 C C++ |
| Batch System | Portable Batch System (PBS) with Maui scheduler |
System Guide
Caltech’s tg-login cluster is composed of 17 Itannium 2 nodes with two 1.3 GHz processors and 6 Gbytes of memory per node. Each node runs SuSe Linux and is interconnected with Myricom’s Myrinet network. The peak performance of the cluster is ~176 Gflops (each node is 10.4 Gflops), with a total memory of 102 Gigabytes, and a total of 50 Terabytes of PVFS disk through the GigE. Jobs are scheduled and run by the Maui-scheduler and PBS-batch system.
| Help and Information |
Please send email to tg-support@cacr.caltech.edu to report problems or ask questions.
| Software |
Below is an interesting list of currently installed packages.
Numerical solvers, libs,etc
- /user/lib/
- ATLAS [3.4.1, 3.7.3]
- GOTO [0.99, 0.97, 0.95, 0.94]
- Petsc [2.1.6]
- Intel’s mkl (intel/mkl/lib/[32.64])
Storage System Related Goodies
- HDF [4-4.1r5, 4.2, S-1.4.5, 5-1.6.2-r1]
Compilers
intel v 8.0 (C. C++, f90)
- /usr/local/Intel/cc_80/bin/[icc, icpc]
- /usr/local/Intel/fc_80/bin/efc
| System Access |
Connect to the front end, tg-login.cacr.caltech.edu:
ssh -l username tg-login.cacr.caltech.edu
You should edit, compile, build and submit your compute node jobs on the front end.
NOTE: password entry into the head node of the cluster is not allowed, but rather access is permitted via ssh public keys. If you do not have a public key, please see our instructions on how to generate one.
| PVFS File Transfer |
PVFS has a few optimized file transfer functions listed below:
% /usr/bin/pvfs2 -[cp, ls, mkdir, touch] |
| File Storage: Disk |
WARNING: It is your responsibility to back up critical data! PVFS is not backed up!
Each user has several areas of disk space for storing files for immediate use on Caltech’s tg-login cluster. These areas may have size or time limits for how long disk files may stay resident.
|
|
|
| /scratch | Local scratch on node, purged at job completion |
| Compiling and Porting MPI Programs |
- Source code can be recompiled for the tg-login system with the following mpi wrapper commands:
mpicc [options] file.c (C and C++)
mpif90 [options] file.f (fixed form Fortran source code)
The following compilers are available on the tg-login cluster:
|
|
|
|
| Compiling: Numerical Libraries |
Intel has developed the Math Kernel Library which contains most of the lapack and fft routines. Users are encouraged to use these routines where applicable instead of their own because they generally produce faster programs and have been tested for accuracy and correctness.
| Running: Interactive and Batch Jobs |
To request interactive nodes for debugging, add the “-I” argument to qsub (see example 2, below). PBS is a utility supporting batch processing which is scheduled by Maui scheduler to help maximize processing throughput.
Stdout/stderr files are temporarily stored in your $HOME/.pbs_spool directory while your job is running. Therefore, your home directory and the .pbs_spool subdirectory must have execute permissions for other users:
chmod o+x $HOME $HOME/.pbs_spool
Please do not change permissions in your .pbs_spool directory after your jobs are in the queue, as this can cause grief to PBS and possibly cause y our job to not terminate properly..
Often-used PBS commands and their functions are as follows:
|
|
|
|
| 1 |
|
|
| 2 |
|
|
| 3 |
|
|
| 4 |
|
|
| 5 |
|
|
| 6 |
|
|
| 7 |
|
|
| 8 |
|
|
| 9 |
|
|
| 10 |
|
NOTE: use the numerical portion of your jobid from PBS when using PBS commands. The alternative is to use qstat -f to obtain the full jobid. Please note that qstat -a and qstat print out a limited number of characters in the jobid field. This can result in a jobid string that is invalid.
The following is an example of a PBS batch script (each PBS command is followed by a comment line):
#!/bin/csh #PBS -q dque # use default queue called "dque", #PBS -N my_job # current job name is "my_job" #PBS -l nodes=10:ppn=2 # request 10 nodes and 2 processors per node #PBS -l walltime=0:50:00 # reserve the requested nodes for 50 minutes #PBS -o file.out # standard output to a file called "file.out" #PBS -e file.err # standard error to a file called "file.err" #PBS -A tg-account_string # your accounting identification string #PBS -V # export all my environment variables to the job cd $HOME/test # change to my working directory mpirun -v -machinefile $PBS_NODEFILE -np 20 ./a.out # run my 20-way parallel job |
Batch Queues
Currently only one queue (the default queue), “dque” is available for all jobs.
| Debugging Programs |
TotalView may be available for serial and parallel code debugging in the future. Use gnu gdb or ddd in the interim.
To compile your program using the TotalView debugger (when it becomes available), use the -g compile line option. For example:
mpicc -g do_mpi.c -o do_mpi
Documentation for Totalview is available at http://www.etnus.com/Products/TotalView/index.html.






