Issues and Fixes

Bugs and Fixes

This page contains a log of technical issues with our JupyterHub deployment that we have resolved.

x509 Error When Starting up a Server

Caused by OpenEBS

OpenEBS has its own certificates. These are automatically updated when helm upgrade is run, so this can happen if the certificates are allowed to expire. The solution is to restart the OpenEBS pods, which will force them to re-generate their certificates. Source

kubectl -n openebs get pods -o name | grep admission-server | xargs kubectl -n openebs delete

Caused by Microk8s

This can also occur if the microk8s certificates expire. These can be renewed with sudo microk8s refresh-certs -e ca.crt. This may require a reboot AND approximately 20 minutes for all the services to update. The certificates need to be manually renewed once a year.

Remove and Reinstall CUDA Drivers

Nvidia drivers are the most common cause of issues that require server maintenance. The first approach should be trying to resolve the issue using apt commands such as sudo apt update, sudo apt upgradable, sudo apt upgrade, sudo apt --fix-broken install. Restarting the server is required after any Nvidia driver changes.

If apt is unable to upgrade or repair the Nvidia drivers and libraries appropriately, the next step is to remove all Nvidia drivers and packages and reinstall them following Nvidia's documentation. That will look something like the following:

sudo apt-get --purge remove "*cublas*" "cuda*" "nsight*" "nvidia*"
sudo apt-get autoremove && sudo apt-get autoclean
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb -O ~/cuda-keyring.deb
sudo dpkg -i ~/cuda-keyring.deb
sudo apt-get update
sudo apt-get install cuda nvidia-container-toolkit
sudo reboot

Provide feedback

Saved searches