MIT Engaging High Performance Compute Cluster¶
The Engaging High Performance Compute Cluster is available to LINC team members to process their jobs at scale, including with the use of GPUs.
Create an account¶
In order to access the Engaging Cluster, you will need a MIT Sponsored Account.
- Please contact Kabi at kabi@mit.edu with your organization name, date of birth, and phone number.
- Once the sponsored account is approved, you will receive an email to complete account registration and establish your MIT Kerberos identity.
- Please send your Kerberos ID to Kabi so that he can add you to the WebMoira group (
orcd_ug_pg_linc_all
) so that you can access the Engaging Cluster.
Documentation overview¶
The MIT Office of Research Computing and Data (ORCD) manage the Engaging Cluster. Most of the information you will need is in the first link below but there are additional resources:
Access the cluster and run jobs¶
The Engaging Cluster has head/login nodes to access the cluster and submit jobs to the compute nodes which run your resource intensive scripts. Job orchestration is performed with the Slurm Workload Manager. The Engaging Cluster Documentation provides details on these operations, including:
- Logging into the cluster
- Cluster architecture including information on the head/login nodes versus compute nodes
- Common commands to interact with the Slurm Job Scheduler
- Run multiple jobs in parallel with
sbatch
- Run interactive jobs on a single compute node with
srun
orsalloc
- Access installed software
- Determining resources for your job
Slurm is a common workload manager so you can also refer to the official Slurm documentation.
Compute nodes¶
The Engaging Cluster has several CPU-only compute nodes and GPU compute nodes. The nodes are categorized according to partitions to control which groups have access to the nodes.
See Determining resources for your job for details on selecting the nodes and resources for your jobs. Briefly, the sinfo
command shows the partitions where you can submit jobs and you can submit jobs to a certain partition by including #SBATCH --partition=<partition_name>
in your sbatch script.
The GPU nodes are available through the ou_bcs_high
and ou_bcs_low
partitions. For more details, see the BCS computing resources on Engaging - Slurm configuration wiki.
Data storage¶
Data can be stored under the following path: /orcd/data/linc/
. We will be working to create an organization strategy for the LINC project data but for now please store your data under a subdirectory (e.g. /orcd/data/linc/<username>
or /orcd/data/linc/<projectname>
). There are additional locations to store your data including the use of scratch space (/orcd/scratch/bcs/001
, /orcd/scratch/bcs/002
, /pool001/<username>
) which can be found under the Storage page and the BCS computing resources on Engaging wiki.
Best practices¶
- Please be respectful of these resources as they are used by many groups.
- Only run resource intensive scripts on the compute nodes and not on the login/head nodes.
- Only run the steps in your script on a GPU compute node if those steps require a GPU. All other steps should be run on a CPU-only compute node.
- Monitor your jobs frequently (
squeue -u <username>
).