Welcome to the datasci-gpu Cluster
Introduction
Welcome to datasci-gpu, a GPU cluster aiming to facilitate access to GPU compute for students in the Faculty of Mathematics at the University of Waterloo. This documentation serves as a comprehensive guide to understanding and utilizing datasci-gpu, a GPU cluster managed through the Slurm workload manager.
The datasci-gpu cluster is one part of a bigger datasci cluster, with datasci-gpu focused on providing GPU compute for students looking to experiment with using GPUs for learning/coursework purposes. The overall datasci cluster also includes a Hadoop cluster that is used specifically for the course CS 451. For more information about how to access the Hadoop side of the datasci cluster, please refer to your instructor's course materials.
Getting access to datasci-gpu
- Students: Send an email to dl-datasci-admin@uwaterloo.ca, letting us know the reason you're looking to access datasci-gpu. For example, if you're looking to use GPU resources for a specific course, let us know which course.
Before making an account request, please load an SSH key at https://authman.uwaterloo.ca
Contact
If you require assistance while using datasci-gpu, you can contact the following:
- Admin support: dl-datasci-admin@uwaterloo.ca
Before contacting admins, please refer to the FAQs page for answers to FAQs, as well as guidelines on how to best structure an issue report to the datasci-gpu admins.
Slurm: How it works
Slurm simplifies the user experience by allowing you to submit, monitor, and manage your computational jobs seamlessly. Through straightforward command-line interfaces, you can submit batch jobs, specify resource requirements, and monitor job progress.
Once your script is ready, launching a job is fairly simple:
- Login: Access datasci-gpu.cs using your credentials.
- Submit a Job: Utilize the
sbatchcommand to submit your script that you wish to run. Think of it as asking the server to perform specific computations for you using specific resources (how many GPUs, how much memory ...). - Monitor Progress: You can use
squeueto view the job queue and monitor job details. - Enjoy: Your job will be run by the server as soon as the requested resources are available.
For more in-depth information, visit this page.
Happy Computing!