Admin guide
Currently the cluster is managed using the commandline over SSH. A large part of the system was configured by CPOS sysadmins, so there is no exact record of the software stack.
- See key info at GPU_Cluster#Technical_Overview.
- See the Admin_install_log for changes made to the cluster, please try to keep this log up-to-date.
In short, the major additions we've installed since the handover are:
- Gitlab omnibus edition (on cpu1) -> intend to move to docker on storage1
- Mediawiki 1.34 including mariadb (on cpu1) -> intend to move to docker on storage1
- Docker, including docker-compose. (on cpu1 and storage1)
- An attempt at installing X11/Xfce as part of a TigerVNC installation. -> aborted, try a docker solution.
The nodes synchronise user information via NIS, which was configured by CPOS and doesn't seem to include group info. If the NIS daemon stops it may prevent SSH login. The main NIS node is cpu1.
Contents
Network architecture
HKU ITS have registered the domain hpc.radiology.hku.hk to the storage1 IP. So storage1 will host a reverse proxy (nginx container) that will use path-based resolution for upstream addresses, i.e. hpc.radiology.hku.hk/<servicename>
.
Proxy upstreams include:
- The 4 compute nodes
- Services like gitlab, mediawikis, etc.
Currently we think it is best to run gitlab and wiki servers locally on storage1. New services would have to be evaluated on a per-case basis, as the 12GB/12core hardware might not be enough to host everything. Likely future additions are PACs and a web file manager.
Use Docker
Using docker containers allows easier setup, per-user configuration, scaling, and backup. Avoid breaking servers with unnecessary software additions, use containers for complex things.
Common Admin Tasks
Creating a new user
sudo useradd -m -G students <username>
Then
sudo passwd <username>
Add user to a group
sudo usermod -aG <groupname> <username>
Anaconda in home folder
su <username>
then
cd ~/
then
bash /home/utility/<anaconda script>
Follow onscreen prompts to install into ~/Anaconda and choose "yes" to initialize the installation.
Print running docker containers on host
sudo docker ps --format 'table {{.Names}}\t{{.Image}}\t{{.Ports}}'
Reconfigure reverse proxy
See the proxy info page