Difference between revisions of "HPC cluster 2"

From radwiki
Jump to navigation Jump to search
(5 intermediate revisions by 2 users not shown)
Line 17: Line 17:
 
| gpu4 || 147.8.100.186 || 48 || A30x4 || 128 || 5.2 || `/` || NA
 
| gpu4 || 147.8.100.186 || 48 || A30x4 || 128 || 5.2 || `/` || NA
 
|-
 
|-
| storage1 || 147.8.100.164 || 18 || NA || 16 || 100 || exported as nfs to `/home/username/share` || `hpc.radiology.hku.hk`
+
| storage1 || 147.8.100.164 || 18 || NA || 16 || 100 || exported as nfs to `/home/[username]/share` || NA
 
|}
 
|}
  
Line 24: Line 24:
 
== Usage ==
 
== Usage ==
 
Users are currently expected to use the GPU cluster in the following ways:
 
Users are currently expected to use the GPU cluster in the following ways:
# Access to this wiki
 
# Access to a [[GitLab Service|GitLab]] account for code sharing and collaboration
 
 
# [[SSH/SFTP| Shell]] access to the compute nodes
 
# [[SSH/SFTP| Shell]] access to the compute nodes
 
#* Users get their own user-specific home folder.
 
#* Users get their own user-specific home folder.
Line 37: Line 35:
 
=== Choosing a server ===
 
=== Choosing a server ===
 
Refer to the [[GPU_Cluster#Hardware_Specifications| specs table]] for IP address information.
 
Refer to the [[GPU_Cluster#Hardware_Specifications| specs table]] for IP address information.
`storage1` is not intended for direct shell access, so password login for non-admins is disabled. The storage node transparently makes its storage capacity available to all nodes as the `/home` directory.
+
`storage1` is not intended for direct shell access, so password login for non-admins is disabled. The storage node transparently makes its storage capacity available to all nodes as the `/home/[username]/share` directory.
  
 
==== For running code and scripts ====
 
==== For running code and scripts ====
Choose cpu1 or any of the 3 gpu servers.  Optionally check the local resource usage with command <code>top</code>, <code>ps</code> or <code>nvidia-smi</code>.
+
Choose any of the 4 gpu servers.  Optionally check the local resource usage with command <code>top</code>, <code>ps</code> or <code>nvidia-smi</code>.
Apart from GPU related things, the 4 servers should have a similar software stack. Software installed includes various scripting languages like Lua, Python, PHP, and Perl. There is also a C/C++ compiler (`gcc`) and a git client. The home folder comes with a pre-installed python environment ([[anaconda]]).  
 
  
 
==== For file transfer/storage ====
 
==== For file transfer/storage ====
It doesn't matter which server you choose, since the user home folder (`/home/[userID]`) is shared across servers. E.g. if you upload a file to `cpu1`, it will also be available when you connect to `gpu2`. Additionally, users may create files and folders in `/home/shared_data` to allow other users to access the same data. To evaluate available storage see the `df` or `du` commands.
+
It doesn't matter which server you choose, since the user home folder (`/home/[username]/share`) is a nfs mounted from the storage server. So it is suggested all of your files or folders should be saved in this directory. To evaluate available storage see the `df` or `du` commands.
 
 
If required for performance reasons, users may write in the <code>/scratch</code> directory. Unlike the <code>/home</code> folder, files stored under <code>/scratch</code> are purely '''local to a server'''. As a `/scratch` folder is accessible to all users on the server, it is advisable to restrict access to others using <code>chmod 700</code> on your files and subfolders.
 
 
 
{{Note|* The `/scratch` folders may cleared without notice, so do '''not''' use it for long-term storage.|warn}}
 
  
 
== Security ==
 
== Security ==
Users may have several sets of login credentials:
 
* One for the Linux shell
 
* One for this wiki
 
* One for their Gitlab
 
 
All the passwords used in the above are stored in a salted hash format. This means, in the event of a data breach, user passwords won't be compromised. It also means they are not retrievable, not even by admins. Password resets are (ideally) handled by automated email.
 
  
 
{{Note|
 
{{Note|
 
* Please use a strong password and protect it.  
 
* Please use a strong password and protect it.  
 
* Non-anonymous patient data should be stored in '''encrypted''' format.|warn}}
 
* Non-anonymous patient data should be stored in '''encrypted''' format.|warn}}
 
=== Ports ===
 
Users may run servers listening on ports, e.g. a VNC/X11 server or Jupyter notebook. Such connections should use an SSH tunnel to ensure security and pass the firewall.
 
 
Connections to websites on hpc.radiology.hku.hk should use SSL/TLS (i.e. https in the browser). This means traffic to the (proxy) server is encrypted.
 
 
== Administration ==
 
See [[gpu cluster technical overview| here]] for logs and technical details.
 

Revision as of 16:29, 20 April 2022

The department of Diagnostic Radiology manages another five servers colloquially referred to as the "GPU cluster 2".

They were set up and handed over in April 2022. The rack is physically located at the laboratory building, LG3/F.

Hardware Specifications

The servers are typically called nodes. There are four GPU computing nodes and one windows storage node.

Name Public IP address Physical CPU cores GPU RAM(GB) Storage (TB) Storage mount point URL
gpu1 147.8.100.183 48 A30x4 128 5.2 / NA
gpu2 147.8.100.184 48 A30x4 128 5.2 / NA
gpu3 147.8.100.185 48 A30x4 128 5.2 / NA
gpu4 147.8.100.186 48 A30x4 128 5.2 / NA
storage1 147.8.100.164 18 NA 16 100 exported as nfs to /home/[username]/share NA

Realtime performance and usage metrics can be found at HPC Diagnostics and Statistics

Usage

Users are currently expected to use the GPU cluster in the following ways:

  1. Shell access to the compute nodes
    • Users get their own user-specific home folder.

A few introductory guides are available to help users. Other software may be installed upon request, but users should note they can manually install any software in their own home directory without needing admin privileges.

  • To use the shell access features of the GPU cluster, users need to get a server account.
  • All users must be on the HKU network / VPN to access anything.

Choosing a server

Refer to the specs table for IP address information. storage1 is not intended for direct shell access, so password login for non-admins is disabled. The storage node transparently makes its storage capacity available to all nodes as the /home/[username]/share directory.

For running code and scripts

Choose any of the 4 gpu servers. Optionally check the local resource usage with command top, ps or nvidia-smi.

For file transfer/storage

It doesn't matter which server you choose, since the user home folder (/home/[username]/share) is a nfs mounted from the storage server. So it is suggested all of your files or folders should be saved in this directory. To evaluate available storage see the df or du commands.

Security

  • Please use a strong password and protect it.
  • Non-anonymous patient data should be stored in encrypted format.