Image on Docker Hub is out of date #180

dpkirchner · 2024-01-19T16:40:17Z

I'm just getting started with clearml (learning the ropes). Per the README section describing Kubernetes integration I tried using the image found on dockerhub, running it outside of k8s: docker run --gpus all -it --rm -v $HOME/clearml-agent.conf:/clearml.conf -v /var/run/docker.sock:/var/run/docker.sock --network clearml_backend --user root. (clearml_backend comes from https://github.com/allegroai/clearml-server/blob/702b6dc9c804165b192a042253ad1d1690c5f0ed/docker/docker-compose.yml` and clearml-agent.conf was created by clearml-agent init).

The output of this command is just: CLEARML_AGENT_UPDATE_VERSION = and the worker does not register. clearml-agent appears to be version 0.17.1 FWIW.

Then I noticed that the image was last updated about 3 years ago. Upgrading the clearml-agent package using pip install --upgrade clearml-agent and bind-mounting the configuration file in the /root directory resolved the problem, however I'm sure there'll be a lot of other issues when using such an old base image (e.g., old CUDA).

I think this might just be a matter of updating the dockerfile to pin a version of nvidia/cuda (base image) and pushing to hub.

The text was updated successfully, but these errors were encountered:

jkhenning · 2024-01-21T09:19:01Z

Hi @dpkirchner , the link you provided does not seem to work - I didn't quite understand which image you used

dpkirchner · 2024-01-22T18:38:58Z

My bad, I added an extra backtick in the link: https://github.com/allegroai/clearml-server/blob/702b6dc9c804165b192a042253ad1d1690c5f0ed/docker/docker-compose.yml

The image I used was linked from here: https://github.com/allegroai/clearml-agent/blob/c9fc092f4eea9c3890d582aa2a098c3c2f39ce72/README.md#kubernetes-integration-optional (scroll down to Spin ClearML-Agent as a long-lasting service pod).

jkhenning · 2024-01-23T06:33:27Z

Oh, I see it now. Honestly I think we should remove this option - this option basically spawns tasks as processes inside the agent's pod, which is not a good pattern in k8s - I would recommend using the helm chart

dpkirchner · 2024-01-26T18:06:17Z

I see, ok. I'll check out the helm chart. Thanks.

dpkirchner · 2024-01-26T22:27:36Z

It looks like the docker container used by the helm chart is also out of date -- it's running clearml-agent 1.2.4rc3 and using python 3.6. The image that is closest to being up to date is allegroai/clearml:1.14.0-431, however you'll need to install docker and the clearml-agent python package to use it, and it's still a bit out of date.

Through experimentation I've found that if you want to use the latest version, you can check out https://github.com/allegroai/clearml-agent, go to the docker/agent directory and edit Dockerfile, replace FROM nvidia/cuda with FROM nvidia/cuda:12.0.0-devel-ubuntu22.04 (can't use 12.3.1 because of a cuda-related bug in nvidia's image), and then build the image locally (I'm using docker build -t clearml-agent:latest . in the docker/agent directory). Following these steps will get you version 1.7.0.

I'm reopening because I'm not sure if this is all intended -- is the allegro/clearml-agent docker image deprecated in general?

(I should note that the clearml-agent build command run in this image does not result in a docker image, but I think that's unrelated, and something to be tracked in a different issue.)

jkhenning · 2024-01-28T10:31:24Z

Hi @dpkirchner,

The docker image used by clearml-helm-charts/clearml-agent chart is indeed pretty old (we're supposed to update it soon) and it's the allegroai/clearml-agent-k8s-base image. However, it is not related to the allegroai/clearml-agent image

thomsmoreau · 2024-04-08T12:23:26Z

Hi @dpkirchner,
Do you have an info about the docker image update on the docker HUB ? There is a lot of outdated elements in it like the "k8s_glue_example.py" not taking list of queues for example

I cannot find a proper way to build the image even with the 'docker' folder from the repository, is that possible to provide a README to build it in local ?

dpkirchner · 2024-04-09T16:30:18Z

I wasn't able to figure out how to use clearml properly, unfortunately, so I moved on to another project.

surya9teja · 2024-05-08T06:57:53Z

@dpkirchner Frankly, I have been hopping into different kinds of MLOps started with airflow + mlflow but it lack dataset versioning. So i moved to clearml and we use k8s (EKS) for most of our ETL pipelines. So I deploy clearml-server which works fine but now I have tried to deploy clearml-agent in cluster but it seems having issues with accessing api server
clearml_agent.backend_api.session.session.LoginError: Failed getting token (error 401 from https://api.clear.ml): Unauthorized (invalid credentials) (failed to locate provided credentials)
As the clearml documentations are not clear about helm charts deployment, it's really hard to understand the code and do PRs.

surya9teja · 2024-05-08T07:09:00Z

@thomsmoreau As I can see there a folder k8s-glue which seems have a various versions of docker images. Based on your cloud you can modify Dockerfile and update the outdated packages.

Note: During the build you have to modify/ add clearml.conf with your credentials as per the Dockerfile script.

I am not a fan of putting credentials into the docker image build but at the same time helm chart value has an option to pass the credentials as a secret which is not working now.

In terms of passing list of queues for k8s_glu_example.py you can pass it as 'queue1,queue2' check here in values.json of helm chart make sure there won't be any spaces between the strings.

thomsmoreau · 2024-05-08T11:20:49Z

@surya9teja I belived that the k8s_glu_example.py file into the docker image was up to date but it is not. The version of it into the docker image provided by the chart does not take into account the separator "," into the string (passed by the "queue"argument) so I had to update it manually, firstly by doing a curl on the raw link you provided (I pulled the chart and changed the templates manually) and then by building a custom image for my company into which I just changed the script and it works fine ! . I did it about a month ago

Since then I didn't check for updates on the docker images but I think we can have better outputs in terms of udated content and performances if devs could push themselves an update.

Thank for your message, I should have commented earlier to maybe help other people stuck as I was

thomsmoreau · 2024-05-08T11:24:46Z

@jkhenning Do you have any info about the update of the chart with an up to date docker image ?

dpkirchner closed this as completed Jan 26, 2024

dpkirchner reopened this Jan 26, 2024

dpkirchner mentioned this issue Jan 26, 2024

clearml-agent build not building a docker image #182

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image on Docker Hub is out of date #180

Image on Docker Hub is out of date #180

dpkirchner commented Jan 19, 2024

jkhenning commented Jan 21, 2024

dpkirchner commented Jan 22, 2024

jkhenning commented Jan 23, 2024

dpkirchner commented Jan 26, 2024

dpkirchner commented Jan 26, 2024 •

edited

Loading

jkhenning commented Jan 28, 2024

thomsmoreau commented Apr 8, 2024 •

edited

Loading

dpkirchner commented Apr 9, 2024

surya9teja commented May 8, 2024

surya9teja commented May 8, 2024

thomsmoreau commented May 8, 2024 •

edited

Loading

thomsmoreau commented May 8, 2024

Image on Docker Hub is out of date #180

Image on Docker Hub is out of date #180

Comments

dpkirchner commented Jan 19, 2024

jkhenning commented Jan 21, 2024

dpkirchner commented Jan 22, 2024

jkhenning commented Jan 23, 2024

dpkirchner commented Jan 26, 2024

dpkirchner commented Jan 26, 2024 • edited Loading

jkhenning commented Jan 28, 2024

thomsmoreau commented Apr 8, 2024 • edited Loading

dpkirchner commented Apr 9, 2024

surya9teja commented May 8, 2024

surya9teja commented May 8, 2024

thomsmoreau commented May 8, 2024 • edited Loading

thomsmoreau commented May 8, 2024

dpkirchner commented Jan 26, 2024 •

edited

Loading

thomsmoreau commented Apr 8, 2024 •

edited

Loading

thomsmoreau commented May 8, 2024 •

edited

Loading