Admin guide

From radwiki
Jump to navigation Jump to search
Try to use Docker for non-trivial services. Using docker containers allows easier setup, per-user configuration, scaling, and cleanup. The learning curve for basic usage is not high and might be a useful skill in software development.

Currently the cluster is managed using the commandline over SSH. A large part of the system was configured by CPOS sysadmins, so there is no exact record of the software stack.

In short, the major additions we've installed since the handover are:

  • Gitlab omnibus edition (on cpu1) -> moved to docker on storage1
  • Mediawiki 1.34 including mariadb (on cpu1) -> moved to docker on storage1
  • Docker, including docker-compose. (on cpu1 and storage1)
  • X11/Xfce as part of a TigerVNC installation.
  • Nginx reverse proxy to access the various web services (currently gitlab and wiki).

The nodes synchronise user information via NIS, which was configured by CPOS and doesn't seem to include group info. If the NIS daemon stops it may prevent SSH login. The main NIS node is cpu1.

Network architecture

HKU ITS have registered the domain hpc.radiology.hku.hk to the storage1 IP. So storage1 will host a reverse proxy (nginx container) that will use path-based resolution for upstream addresses, i.e. hpc.radiology.hku.hk/<servicename>.

Proxy upstreams include services like gitlab, mediawikis, etc.

Currently we think it is best to run gitlab and wiki servers locally on storage1. New services would have to be evaluated on a per-case basis, as the 12GB/12core hardware might not be enough to host everything. Likely future additions are PACs and a web file manager.

Common Admin Tasks

Creating a new user

sudo useradd -m -G students <username> Then sudo passwd <username>

Add user to a group

sudo usermod -aG <groupname> <username>

Anaconda in home folder

su <username> then cd ~/ then bash /home/utility/<anaconda script> Follow onscreen prompts to install into ~/Anaconda and choose "yes" to initialize the installation.

Print running docker containers on host

  sudo docker ps --format 'table {{.Names}}\t{{.Image}}\t{{.Ports}}'

Setup VNC

create vncconfig in /etc/systemd/system (optional, for permanent server)
as user run vncpasswd
cp /etc/X11/Xresources ~/.Xresources
cp /home/utility/xstartup ~/.vnc/xstartup  (for custom xfce)
sysctl daemon-reload && start && enable vncservice@:N.service (optional, for permanent server)

Use SSH tunnel, then connect on port 5900+N. Optionally try rm -R /tmp/.X* to clean locks. Consider restricting number of per-user connections.

Reconfigure reverse proxy

See the proxy info page

Common admin problems

Locked out of server

When adjusting iptables, set a timed reset using cron or at so you can get back in. Worst case you will need physical access. If you haven't touched iptables, it could be an issue with the NIS server. Also make sure firewalld stays disabled.

Lost a docker container or volume

Keep in mind docker-compose down will stop and remove containers along with anonymous volumes. Likewise running a container with --rm option will remove anonymous dangling volumes.

Make regular backups.