-
Notifications
You must be signed in to change notification settings - Fork 0
Issues and Fixes
This page contains a log of technical issues with our JupyterHub deployment that we have resolved.
OpenEBS has its own certificates. These are automatically updated when helm upgrade
is run, so this can happen if the certificates are allowed to expire. The solution is to restart the OpenEBS pods, which will force them to re-generate their certificates. Source
kubectl -n openebs get pods -o name | grep admission-server | xargs kubectl -n openebs delete
This can also occur if the microk8s
certificates expire. These can be renewed with sudo microk8s refresh-certs -e ca.crt
. This may require a reboot AND approximately 20 minutes for all the services to update. The certificates need to be manually renewed once a year.
Nvidia drivers are the most common cause of issues that require server maintenance. The first approach should be trying to resolve the issue using apt
commands such as sudo apt update
, sudo apt upgradable
, sudo apt upgrade
, sudo apt --fix-broken install
. Restarting the server is required after any Nvidia driver changes.
If apt
is unable to upgrade or repair the Nvidia drivers and libraries appropriately, the next step is to remove all Nvidia drivers and packages and reinstall them following Nvidia's documentation. That will look something like the following:
sudo apt-get --purge remove "*cublas*" "cuda*" "nsight*" "nvidia*"
sudo apt-get autoremove && sudo apt-get autoclean
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb -O ~/cuda-keyring.deb
sudo dpkg -i ~/cuda-keyring.deb
sudo apt-get update
sudo apt-get install cuda nvidia-container-toolkit
sudo reboot