Difference between revisions of "GPU Cluster"

From radwiki
Jump to navigation Jump to search
m (1 revision imported)
Line 20: Line 20:
 
|}
 
|}
  
url access is not available yet, direct IP address access should always work.
+
{{Note|url access is not 100% available yet, direct IP address access should always work.|warn}}
  
 
== Usage ==
 
== Usage ==

Revision as of 23:31, 10 April 2020

The department of Diagnostic Radiology manages five servers colloquially referred to as the "GPU cluster".

They were set up by IT services and handed over in February 2020. The rack is physically located at The Hong Kong Jockey Club Building For Interdisciplinary Research, 1/F.

Hardware Specifications

There are four computing nodes, and one storage node that shares storage across all nodes.

Name Public IP address Physical CPU cores GPU RAM(GB) Storage (TB) Storage mount point URL
gpu1 147.8.193.173 16 V100(16G)x4 64 1.6 /scratch hpc.radiology.hku.hk/gpu1
gpu2 147.8.193.172 16 V100(16G)x4 64 1.6 /scratch hpc.radiology.hku.hk/gpu2
gpu3 147.8.193.175 16 V100(16G)x4 64 1.6 /scratch hpc.radiology.hku.hk/gpu3
cpu1 147.8.193.174 40 NA 512 1.6 /scratch hpc.radiology.hku.hk/cpu1
storage1 147.8.193.171 12 NA 16 100 exported as nfs to /home Reverse proxy for hpc.radiology.hku.hk
url access is not 100% available yet, direct IP address access should always work.

Usage

Users are currently expected to use the GPU cluster in the following ways:

  • Access to this wiki
  • Access to a personal wiki
  • Access to create and use a GitLab account
  • Shell access to the compute nodes
    • Users get their own user-specific home folder.
    • The home folder comes with a pre-installed python environment (anaconda). They may in turn use that to host a Jupyter Notebook server.
  • Other software may be installed upon request.

Choosing a server

Refer to the specs table for IP address information. storage1 is not intended for direct access, so password login for non-admins is disabled.

For running code and scripts

Choose cpu1 or any of the 3 gpu servers. Optionally check the local resource usage with command top, ps or nvidia-smi. Apart from GPU related things, the 4 servers should have a similar software stack. Software installed includes various scripting languages like Lua, Python, PHP, and Perl. There is also a C/C++ compiler (gcc) and a git client.

For file transfer/storage

It doesn't matter which server you choose, since the user home folder (/home/[userID]) is shared across servers. E.g. if you upload a file to cpu1, it will also be available when you connect to gpu2. Additionally, users may create files and folders in /home/shared_data to allow other users to access the same data.

If required for performance reasons, users may write in the /scratch directory. Unlike the /home folder, files stored under /scratch are purely local to a server. As a /scratch folder is accessible to all users on the server, it is advisable to restrict access to others using chmod 700 on your files and subfolders.

Note

  • To use the shell access features of the GPU cluster, users need to get a server account.
  • All users must be on the HKU network / VPN to access anything.
  • The /scratch folders may cleared without notice so do not use it for long-term storage.

Technical Overview

This section contains information that may be useful for admins.

  • All GPU cluster systems run on CentOS 7.
  • User creation is done on cpu1 and automatically synchronised across servers via NIS.
  • Normal users do not have package manager installation privileges.
  • The storage from storage1 is mounted as an NFS share on /home in the other 4 servers.
  • Root password is not available, current sudoers are hpcadmin, itsupport, richard, jurgen.

Known issues

See here for a to-do list of things that should be fixed.

GitLab Service

Running on CPU1. Uses Nginx server with port 9171. Currently not blocking non-department users.

Visibility of user repositories is private amongst member of the same Group. We created the department-wide group Radiology.

Backup with: sudo gitlab-backup create , default path /var/opt/gitlab/backups

Mediawiki Service

Running on CPU1. Uses Apache server on port 80. Currently not blocking non-department users.

Emails

Todo. Required for password resets and admin alerts.

Backups

Todo. Updated tar to v1.32.

Firewall

  • Disabled firewalld. Using iptables defaults from IT services.
  • Confirmed all HKU IPs (public and private IP ranges) are whitelisted.
  • In addition to the defaults, our opened ports are 22, 80, 443, 9171