MIT Engaging High Performance Compute Cluster¶

The Engaging High Performance Compute Cluster is available to LINC team members to process their jobs at scale, including with the use of GPUs.

Create an account¶

In order to access the Engaging Cluster, you will need a MIT Sponsored Account.

Please contact Kabi at kabi@mit.edu with your organization name, date of birth, and phone number.
Once the sponsored account is approved, you will receive an email to complete account registration and establish your MIT Kerberos identity.
Please send your Kerberos ID to Kabi so that he can add you to the WebMoira group (orcd_ug_pg_linc_all) so that you can access the Engaging Cluster.
Sign up for two-factor authentication with Duo at https://duo.mit.edu. You will need to download the Duo Mobile app and register your phone with the QR code that is displayed at https://duo.mit.edu.
To access the Engaging cluster for the first time, you will need to log in via the Engaging Open OnDemand Portal. For instructions, see here.

Documentation overview¶

The MIT Office of Research Computing and Data (ORCD) manage the Engaging Cluster. Most of the information you will need is in the first link below but there are additional resources:

Email updates¶

The ORCD team will regularly send out updates about the Engaging Cluster to your MIT email address, which can be accessed at https://outlook.office.com.

Access the cluster and run jobs¶

The Engaging Cluster has head/login nodes to access the cluster and submit jobs to the compute nodes which run your resource intensive scripts. Job orchestration is performed with the Slurm Workload Manager. The Engaging Cluster Documentation provides details on these operations, including:

Slurm is a common workload manager so you can also refer to the official Slurm documentation.

Compute nodes¶

The Engaging Cluster has several CPU-only compute nodes and GPU compute nodes. The nodes are categorized according to partitions to control which groups have access to the nodes.

See Determining resources for your job for details on selecting the nodes and resources for your jobs. Briefly, the sinfo command shows the partitions where you can submit jobs and you can submit jobs to a certain partition by including #SBATCH --partition=<partition_name> in your sbatch script.

The GPU nodes are available through the ou_bcs_high and ou_bcs_low partitions. For more details, see the BCS computing resources on Engaging - Slurm configuration wiki.

Data storage¶

Data can be stored under the following path: /orcd/data/linc/. We will be working to create an organization strategy for the LINC project data but for now please store your data under a subdirectory (e.g. /orcd/data/linc/<username> or /orcd/data/linc/<projectname>). There are additional locations to store your data including the use of scratch space (/orcd/scratch/bcs/001, /orcd/scratch/bcs/002, /pool001/<username>) which can be found under the Storage page and the BCS computing resources on Engaging wiki.

Best practices¶

Please be respectful of these resources as they are used by many groups.
Only run resource intensive scripts on the compute nodes and not on the login/head nodes.
Only run the steps in your script on a GPU compute node if those steps require a GPU. All other steps should be run on a CPU-only compute node.
Monitor your jobs frequently (squeue -u <username>).