Skip to content

Issues and Fixes

langdonholmes edited this page Jan 29, 2025 · 6 revisions

Bugs and Fixes

This page contains a log of technical issues with our JupyterHub deployment that we have resolved.

x509 Error When Starting up a Server

image

Caused by OpenEBS

OpenEBS has its own certificates. These are automatically updated when helm upgrade is run, so this can happen if the certificates are allowed to expire. The solution is to restart the OpenEBS pods, which will force them to re-generate their certificates. Source

kubectl -n openebs get pods -o name | grep admission-server | xargs kubectl -n openebs delete

Caused by Microk8s

This can also occur if the microk8s certificates expire. These can be renewed with sudo microk8s refresh-certs -e ca.crt. This may require a reboot AND approximately 20 minutes for all the services to update. The certificates need to be manually renewed once a year.

Remove and Reinstall CUDA Drivers

Nvidia drivers are the most common cause of issues that require server maintenance. The first approach should be trying to resolve the issue using apt commands such as sudo apt update, sudo apt upgradable, sudo apt upgrade, sudo apt --fix-broken install. Restarting the server is required after any Nvidia driver changes.

If apt is unable to upgrade or repair the Nvidia drivers and libraries appropriately, the next step is to remove all Nvidia drivers and packages and reinstall them following Nvidia's documentation. That will look something like the following:

  • sudo apt-get --purge remove "*cublas*" "cuda*" "nsight*" "nvidia*"
  • sudo apt-get autoremove && sudo apt-get autoclean
  • wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb -O ~/cuda-keyring.deb
  • sudo dpkg -i ~/cuda-keyring.deb
  • sudo apt-get update
  • sudo apt-get install cuda nvidia-container-toolkit
  • sudo reboot