The Broadcom Distributed Unified Cluster (BDUC) is, as the name suggests, a distributed group of computing nodes unified by running under a single Sun Grid Engine (SGE) Resource Manager. BDUC consists of 3 subclusters: 1 subcluster of 40 dual-core Intel x86 nodes running 32-bit/CentOS 5 in the OIT Academic Data Center, 1 subcluster of 40 dual-core AMD64 Opteron nodes running 64-bit/CentOS 5 in the ICS data center, and 1 subcluster of 4 dual-processor dual-core AMD64 Opteron nodes with 32 GB RAM running 64-bit Kubuntu Ibex (8.10) Desktop Edition.
The latter subcluster, known as Broadcom EA Replacement (BEAR), is administered especially for interactive use and, consequently, will have a different, larger set of applications. You have access to the full graphical KDE desktop via VNC, as well as the individual GUI applications and shell utilities. One of the nodes (claw1) is reserved for only interactive use; the 3 others can be used for both interactive and batch runs (currently limited to 48hrs) on the claws queue.
By default, BDUC is open to all UCI faculty, graduate students, and post-graduate researchers. Each new user must send an e-mail request to bduc-request@uci.edu to activate his/her UCInetID for use on the cluster.
Use your UCINetID and associated password to log into the login node
(bduc.nacs.uci.edu) via ssh:
ssh -Y your_UCINetID@bduc.nacs.uci.edu
(On MacOSX and Linux, ssh is installed by default. On Windows, use the free PuTTY application or one of these alternatives. Telnet access is NOT available; and, if you use ssh without the -Y or -X option, you won't be able to view X11 graphics - see below).
This will allow you to ssh/scp to frequently used hosts without entering a passphrase each time. This process works on Linux and Mac only. Windows clients can do it as well, but it's a different procedure.
In a terminal on your Mac or Linux machine, type:
# for no passphrase, use ssh-keygen -b 1024 -N "" # if you want to use a passphrase: ssh-keygen -b 1024 -N "your passphrase"
save to the default places.
Now, use the ssh-copy-id command, normally included as part of your ssh distribution. This does all the copying one shot, using (by default) your ~/.ssh/id_rsa.pub key (by default; use the -i option to specify another identity file, say ~/.ssh/id_dsa.pub if you're using DSA keys)
ssh-copy-id your_bduc_login@bduc.nacs.uci.edu # you'll have to enter your password one last time to get it there.
What this does is to scp id_rsa.pub to the remote host (the ssh server your're trying to log into) and append that key to the remote ~/.ssh/authorized_keys. If things don't work, check that the id_rsa.pub file has been appended correctly.
Then verify that it's worked by ssh'ing to BDUC. You shouldn't have to enter a password anymore.
This will give you access to a Linux shell (bash by default; tcsh and ksh are also available) and to the resources of the BDUC via the SGE commands. The most frequently used commands will be qrsh to request an interactive node and qsub to submit a batch job. You can also check the status of various resources with the qconf command, and you can get an idea of the hosts that are currently up by issuing the qhost command. See the SGE cheatsheet (PDF format) for common SGE q commands and options.
The login node should be considered your first stop in doing real work. You can copy files to and from your home directory from the login node, but you should not do any compiling nor run any jobs on the login node. If you do and we notice, we'll kill them off. To fully utilize the capabilities of the cluster, you should request an interactive node or submit a batch job, like this:
# for a 32 bit interactive node hmangala@bduc-login:~ $ qrsh -q int32 # wait a few seconds... hmangala@bduc-i32-3:~ # ^^^^^^^^^^ note login on remote node. # for a 64 bit interactive node hmangala@bduc-login:~ $ qrsh -q int_a64 # wait a few seconds... hmangala@bduc-amd64-2:~
The most direct, most available way is via Secure copy, i.e. scp. Besides the commandline scp utility bundled with all Linux and Mac machines, there are GUI clients for MacOSX and Windows, and of course, Linux. If you have large collections of files or large individual files that change only partially, you might be interested in using rsync as well.
For Windows users, we recommend the free WinSCP application, which gives you a graphical interface for SCP, SFTP and FTP.
For Mac OS X users, we recommend the free, though oddly named, Cyberduck application, which provides graphical file browsing via FTP, SCP/SFTP, WebDAV, and even Amazon S3(!).
For Linux and UNIX users, we recommend using the built-in capabilities of KDE's Swiss Army knife browser Konqueror or twin panel file manager Krusader which both support the secure file browser kio-plugin called fish. If you use a fish URL, you can connect the server via shared keys or via password:
fish://hmangala@bduc.nacs.uci.edu
Advanced users should read the document HOWTO_move_data, which discusses in detail how to transfer large amounts of data over the network.
On the login node, you shouldn't do anything too strenuous (computationally). If you run something that takes more than a minute to complete, you should be running on an interactive node or submit it to one of the batch queues.
qrsh given alone will start an ssh -Y session with one of the nodes in the interactive Q.
qrsh given with an explicit Q (qrsh -q int64) will request a particular Q and therefore a particular architecture. Currently, there are 2 architectures (AMD64 & Intel32) that are all configured similarly.
Logging on to an interactive node may be all that you need. If you want to slice & dice data interactively, either with a graphical app like MATLAB, VISIT, JMP, or clustalx, or a commandline app like nco or scut or even hybrids like gnuplot or R, you can run them from any of the interactive nodes, read, analyze and save data to your /home directory. As long as you satisfy the graphics requirements, you can view the output of the X11 graphics programs as well.
If you have jobs that are very long or require multiple nodes to run, you'll have to submit jobs to a Q. qsub [job_name.sh] will submit the job described by job_name.sh to SGE, which will look for an appropriate Q and then start the job running via that Q. For example, if you need an AMD64 architecture, you can request it explicitly: qsub -q longbat64 job_name.sh, which will try to run it on the least loaded AMD64 machine.
Once you log into the login node (via ssh -Y your_UCINetID@bduc.nacs.uci.edu), you can get an idea of the hosts that are currently up by issuing the qhost command. You can find out the status of your jobs with qstat alone, which will tell you the status of your jobs or;
qstat -u '*'
will tell you the status of all jobs currently queued or running. See the SGE cheatsheet (PDF format) for common SGE q commands and options.
The shell script that you submit (job_name.sh above) should be written in either bash or csh and should completely describe the job, including where the inputs and outputs are to be written (if not specified, the default is your home directory. The following is a simple shell script that defines bash as the job environment, calls date, waits 20s and then calls it again.
#!/bin/sh # (c) 2008 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. # This is a simple example of a SGE batch script # request Bourne shell as shell for job #$ -S /bin/sh # print date and time date # Sleep for 20 seconds sleep 20 # print date and time again date
A slightly more elaborate one is here. And a much more elaborate one is here.
The queues have been to reorganized for clarity. They now are organized as follows:
Queue time Nodes
==============================
int32 1hr * 2
int64 1hr * 2
quickbat32 12hr 4
quickbat64 12hr 4
longbat32 100hr 34
longbat64 100hr 34
* for interactive use, you have 1 hr of aggregate CPU time (not
wallclock time). These are interactive only; you can't
submit batch jobs to them.
To submit short jobs (<12hr), you can most easily not specify a Q - it will go on any batch Q. To run on a longbatxx Q, either specify the estimated runtime in the submission script by including the -l h_rt parameter
#$ -l h_rt=00:30:00 #30 min run
(also see below)
or submit specifically to one of the longbatxx Qs.
ie:
$ qsub -q longbat64 yourshellname.sh # or include the Q spec in the script: #$ -q longbat64
#$ -N job_name # this name shows in qstat #$ -S /bin/bash # run with this shell #$ -q longbat64 # #$ -l h_rt=50:00:00 # need 50 hour runtime #$ -l mem_free=2G # need 2GB free RAM #$ -l scr_free=XG # need X GB scratch space #$ -pe mpich 4 # define parallel env #$ -cwd /this/dir # run the job in the directory specified. #$ -o job_name.out #$ -e job_name.err # (-j will merge stdout and stderr) #$ -notify #$ -M user@uci.edu - send mail about this job to the given email address. #$ -m beas # send a mail to owner when the job # begins (b), ends (e), aborted (a), # and suspended(s).
(for more on SGE shell scripts, see here: http://nbcr.sdsc.edu/pub/wiki/index.php?title=Sample_SGE_Script
If you need to run an MPI parallel job, you can request the needed resources by Q as well by specifying the resources inside the shell script (more on this later) or externally via the -q and -pe flags (type man sge_pe on one of the BDUC nodes).
The ROCKS group has a very good SGE Introduction from the User's perspective. Ignore the ROCKS-specific bits.
Google Sun Grid Engine is a good, easy start.
Chris Dagdigian's SGE site is very good and has an excellent wiki
The official Sun Grid Engine site has a lot of good links, especially the HOWTOS
The SGE docs are the final word, but there are a lot of pages to cover.
All the interactive nodes will have the full set of X11 graphical tools and libraries. However, since you'll be running remotely, any application that requires OpenGL, while it will probably run, will run so slowly that you won't want to run it for long. If you have an appliation that requires OpenGL, you'll be much better off downloading the processed data to your own desktop and running the application locally.
In order to have access to these X11 tools via Linux, your local Linux must have the X11 libraries available. Unless you have explicitly excluded them, all modern Linux distros include X11 runtime libraries. Don't forget to use the the -Y flag when you connect using ssh to tunnel the X11 display back to your machine:
ssh -Y you@bduc.nacs.uci.edu
The MacOSX installation DVDs come with a free, Apple-certified X11 installation. On Leopard it's in Optional Installs -> Optional Installs.mpkg All you have to do is install it and start it running in the background to accept the X11 windows (Applications -> Utilities -> X11) Ditto the -Y ssh flag as above.
BDUC is still in beta roll-out mode so many applications, libraries, utilities, editors, etc aren't available. When you find something missing or a behavior that seems odd, please let us know. You can email the BDUC admins at bduc-request@uci.edu.