Center for Grid Technologies
at ISI/USC in Marina del Rey, California, USA
   
 

Resources > CGT Data Cluster

General

CGT's Data Cluster consists of 16 compute nodes and one head node (dc-user2.isi.edu). Jobs are managed by LSF and should be submitted through gatekeeper running on the headnode. The compute nodes have the following characteristics:

  • RedHat 9.0 (same as dc-user2)
  • Dual Intel PIII 550MHz CPUs
  • 1.5 GB RAM
  • Gigabit network connection
  • 85 GBs of /scratch space
  • Access to Action home directories and other Action NFS file systems

Monitoring

A GRIS is running on the standard port on dc-user2. A LSF GRAM reporter is installed and provides information on jobs in the queueing system.

There is also Ganglia metrics (such as CPU, memory and network utilization) available at http://dc-master.isi.edu/ganglia/

Accessing the Cluster

dc-user2 is a general purpose machine where jobs can be compiled and tested.

The head node, dc-user2.isi.edu, is accessible through ssh, but is restricted to CGT. Like any ISI host, dc-user2 only accepts ssh public key authentication. ISI accounts are not enabled by default, so if you need access to dc-user2, send an email to cgt-support@isi.edu .

GridFTP

For now, each node is running GridFTP. This will probably not be the case in the future as the GridFTP daemon and bandwidth usage may affect the job that has been scheduled for the node. Please do not write jobs that assume GridFTP is running on the compute nodes.

Queues for Running Jobs

The following table describes the available queues:

low Low priority jobs. Use this queue for jobs that will run for a longer time, for example over night.
normal Normal priority jobs. Use this queue for most jobs. This is the default queue.
high High priority jobs. Access is restricted.
exclusive Jobs that need exlusive access to the node, as opposed to being scheduled for maximum throughput. This queue could be useful for running benchmark applications.

Use dc-user2.isi.edu/jobmanager-lsf as a resource string to submit jobs to the cluster.

Condor-G example job:

# always use the globus universe with condor-g 
universe = globus 
		
# use the full path to the executable 
executable = /nfs/asd/rynge/jobs/myjob.sh 
		
# do not transfer the executable - if this is true 
# the executable will be transfered from the current 
# directory 
transfer_executable = false 
		
# this specifies where the job should be submittied 
# and what jobmanager to use 
globusscheduler = dc-user2.isi.edu/jobmanager-lsf 
		
# additional globus rsl 
globusrsl = (jobtype=multiple)(count=4)(queue=normal)
		
# specify the output files 
output = job.$(cluster).out 
error = job.$(cluster).err
log = job.$(cluster).log 
		
# now queue it queue 

Available software

Software common to the whole cluster is installed in /cluster By sourcing the setup.{sh|csh} files, you will get the right environment for that piece of software. There are also 'default' symlinks to the most current version.

For example, to use Globus with a bourne shell, run:

. /cluster/globus/default/setup.sh         

or, for csh:

source /cluster/globus/default/setup.csh         

FAQ

What CAs are trusted?

We trust most of the common ones. Feel free to suggest one if it is missing. The trusted CAs are:

  • Globus Alliance
  • USC
  • UK eScience
  • NPACI
  • NCSA
  • DOE
  • TACC

How do I get into the grid-mapfile?

Send an email with your subject to cgt-support@isi.edu.

How do I make jobs run on specific nodes in the cluster?

We have added a 'host_selection' rsl parameter that maps to the bsub -m lsf host selection attribute.

The value is a list of hostnames seperated by spaces.

For example:

globusrun -o -r dc-user2.isi.edu/jobmanager-lsf \ 
	'&(queue=exclusive)(count=2) \
	(host_selection="dc-n12 dc-n13") \
	(executable=/bin/hostname)' 

Why do short jobs get so poor performance?

If a short job is only a couple of minutes, it is because of Globus and scheduling overhead.

Try to group smaller tasks into one longer job.

Can I have SSH access to the compute nodes?

No!

I want a Globus gatekeeper on each node. How do I do that?

Here is an example of how to run personal gatekeepers on the cluster.

To start the gatekeepers, run:

globusrun -o -r dc-user2.isi.edu/jobmanager-lsf \
	-f /nfs/asd/rynge/jobs/personal-gatekeeper/job.rsl 

When submitting jobs to the gatekeepers, you must specify the subject you are expecting from the resource. This is not the host one for personal gatekeeper, but your user cert.

For example:

globusrun -a -r \
	dc-n2.isi.edu::'/C=US/O=NPACI/OU=SDSC/CN=Mats Rynge/USERID=ux454281'

See job.sh and job.rsl in /nfs/asd/rynge/jobs/personal-gatekeeper/ for more information.

The only argument to job.sh is the number of seconds you want to leave the gatekeepers running.



Center for Grid Technologies (CGT)
USC/ISI
4676 Admiralty Way
Marina del Rey, CA 90291