Daru Lab guide to using Stanford Sherlock HPC cluster

Sherlock HPC Resources

We have access to the Sherlock clusters. Documentation for Sherlock is here.

As a member of the Daru lab, you have access to the following resources on Sherlock:

Individual and Group Scratch Space: You have access to 100TB of scratch space for storing temporary data. Please note that files on SCRATCH and GROUP_SCRATCH are automatically purged 90 days after their last content modification. So, make sure to move your data before that time to avoid loss.
Home Directory: Each lab member has 15GB of space in their home directory.
GROUP_HOME: This is a shared resource with 1.0TB of space. Before using GROUP_HOME, please contact me for approval.
Partitions: The partitions available to Daru lab members are the "normal" and "hns" partitions. Each partition has a maximum walltime of 2 days for running jobs.

Connecting by SSH

Use SSH from a terminal and your SUNet username to log in.

# Connect to Sherlock from your local computer
ssh <user>@login.sherlock.stanford.edu

Setup Your Scratch Directory

On Sherlock, you can access several partitions, but the ones commonly used in the Daru lab are the "hns" and "normal" partitions. Use your user-specific scratch directory to run your jobs. This is also where you can temporarily store large data files, but remember that files on SCRATCH and GROUP_SCRATCH are automatically purged 90 days after their last content modification.

To transfer files from your local computer to the cluster, you can use sftp, or you can download data directly on the cluster if it is hosted online somewhere.

# On your local computer
# Transfer files or dirs from your local computer to the scratch space
sftp <user>@dtn.sherlock.stanford.edu
put /pathtofile/on/your/computer/file.csv

Submit Jobs to the Cluster Using SLURM

The Sherlock cluster uses the SLURM job submission system to manage shared resources on the cluster. When you log in, you will start at the "head" node, which is like a waiting area. Remember not to run any jobs on this node. Instead, you can submit your jobs via SCRATCH using a "job script" to reserve resources for your job and send it to run on a "compute node".

To keep things organized, create a directory called "Batch" for your job submissions. This way, you can easily keep track of your tasks.

# In the SCRATCH node
mkdir ~/Batch{1..10}

Example Job Submission

The bash script below tells the scheduler the resources we need, which partition to use ("hns"), and how the job and output files should be named. The command below reserves 16 cores and executes the R CMD BATCH command to run the R script calibration.R in the Batch* folder. The file is named script.sbatch and is placed one directory outside where the R scripts are located in the "Batch/" directory.

# Open file with vi text editor on the head node
vi script.sbatch

#!/usr/bin/bash
#SBATCH --time=2-00
#SBATCH -p hns
#SBATCH --ntasks-per-node=16
#SBATCH --nodes=1
#SBATCH --mem=90GB

cd ${1}

R CMD BATCH --no-save --no-restore calibration.R output.out

Submit the job to the scheduling queue.

# On the scratch node
for DIR in Batch*; do sbatch script.sbatch $DIR; done

Check whether it has started yet.

# On the scratch node
sacct

Interactive Mode

If you only plan to do a small amount of work, it is better to jump into an interactive session rather than submit a job to request many resources. This type of job will usually start quickly.

To request a 30-minute interactive session, use the following command:

sdev

This will allow you to start a quick and interactive session on the Sherlock cluster for shorter tasks or testing purposes.