Skip to content

Add Job_artifacts endpoints and commands #71

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

arvidj
Copy link
Collaborator

@arvidj arvidj commented Nov 15, 2022

Adds the Job_artifacts module and implements three endpoints from https://docs.gitlab.com/ee/api/job_artifacts.html

Updated the lab tool with corresponding commands.

To do this, I had to expose the media_type argument of API.get.

@tmcgilchrist
Copy link
Owner

Thanks @arvidj it looks like you're still working on this so I'll just leave some questions here.

Exposing the media_type looks right, up till now there wasn't a reason to.

Do you plan on filling in the remaining features for Job Artifacts?

Have you tested the downloading cli using large artifacts? I didn't add a streaming response suitable for downloading binary artifacts so it might not be very efficient.

@arvidj
Copy link
Collaborator Author

arvidj commented Nov 16, 2022

Thanks @arvidj it looks like you're still working on this so I'll just leave some questions here.

Hi, it should be mostly finished but I haven't reviewed the code myself yet.

Exposing the media_type looks right, up till now there wasn't a reason to.

Do you plan on filling in the remaining features for Job Artifacts?

For the moment, I only needed those three endpoints so I exposed only them. I could do the rest if you prefer.

Have you tested the downloading cli using large artifacts? I didn't add a streaming response suitable for downloading binary artifacts so it might not be very efficient.

No, I haven't tried it on large artifacts. TBH, I have a very weak understanding of how to handle binary data efficiently, so I was happy to see that this "just worked" for the smaller files I tried. I'd be glad to do a more robust implementation but I would need some pointers to get started. I'm trying now to download the artifacts of:

https://gitlab.com/tezos/tezos/-/jobs/3330701870/artifacts/browse

A weird thing I'm noticing is that GitLab intermittently returns 401 although I have a token configured. And I verified that ocaml-gitlab sends along the headers. I'm writing it off as a API flakiness for now. Although it'd be nice to have better output in this case. A complication is that the GitLab is in XML in this case!

🤡 OCAMLRUNPARAM=b GITLAB_DEBUG=1 dune exec cli/main.exe -- artifact get --project-id 3836952 --job-id 3330701870
>>> GitLab: Requesting https://gitlab.com/api/v4/projects/3836952/jobs/3330701870/artifacts (headers: accept: application/octet-stream
Authorization: Bearer glpat-[OMITTED]
user-agent: ocaml-gitlab ocaml-cohttp/5.0.0

)
>>> GitLab: Response code 302 Found

>>> GitLab: Requesting https://cdn.artifacts.gitlab-static.net/da/18/da1836cc3c2c6f6645e6d2a8a03c4f819e8256dfce5953aee56bfede82db502c/2022_11_16/3330701870/3635947479/build-x86_64-yy-tezt-typechecking.zip?Expires=1668593099&KeyName=gprd-artifacts-cdn&Signature=zEpStiJpX2i3PZMwsZNg_HVsNq4= (headers: accept: application/octet-stream
Authorization: Bearer glpat-[OMITTED]
user-agent: ocaml-gitlab ocaml-cohttp/5.0.0

)
>>> GitLab: Response code 401 Unauthorized

>>> GitLab: response body:
<?xml version='1.0' encoding='UTF-8'?><Error><Code>AuthenticationRequired</Code><Message>Authentication required.</Message></Error>
lab: internal error, uncaught exception:
     Yojson.Json_error("Line 1, bytes 0-33:\nExpected '{' but found '<?xml version='1.0' encoding='UTF'")
     Raised at Yojson.json_error in file "common.ml", line 5, characters 19-39
     Called from Yojson.Safe.read_lcurl in file "lib/read.ml" (inlined), line 2319, characters 3-42
     Called from Gitlab_j.read_message in file "lib/gitlab_j.ml", line 40752, characters 4-31
     Called from Gitlab_core.Make.API.handle_response.(fun) in file "lib/gitlab_core.ml", line 714, characters 30-61
     Called from Lwt.Sequential_composition.bind.create_result_promise_and_callback_if_deferred.callback in file "src/core/lwt.ml", line 1860, characters 23-26
     Re-raised at Lwt.Miscellaneous.poll in file "src/core/lwt.ml", line 3068, characters 20-29
     Called from Lwt_main.run.run_loop in file "src/unix/lwt_main.ml", line 31, characters 10-20
     Called from Lwt_main.run in file "src/unix/lwt_main.ml", line 118, characters 8-13
     Re-raised at Lwt_main.run in file "src/unix/lwt_main.ml", line 124, characters 4-13
     Called from Cmdliner_term.app.(fun) in file "cmdliner_term.ml", line 24, characters 19-24
     Called from Cmdliner_eval.run_parser in file "cmdliner_eval.ml", line 34, characters 37-44

However, once the file downloaded, it was identical to the file downloaded by curl. With curl, the download took ~20 seconds, and with lab it took ~1 minute.

@arvidj
Copy link
Collaborator Author

arvidj commented Dec 6, 2022

@tmcgilchrist any hints on how to move forward on this? Do you think I need to:

add a streaming response suitable for downloading binary artifacts

? I see that Cohttp_lwt.Body exposes to_stream so the issue would be just threading that through the API module (and dealing with the semantic confusion of having two kinds of streams ;)).

@arvidj
Copy link
Collaborator Author

arvidj commented Dec 6, 2022

It doesn't seem too hard change the module API so that handler functions receive CLB.body instead of the body string directly. From there, we can choose to return a string Lwt.t to the user, so that Job_artifacts.get : string Lwt_stream.t option Response.t Monad.t (that signature is a mouthful). I read online though that Lwt_streams are frowned upon. Another option is returning the CLB.write_body function so that:

Job_artifacts.get : (string -> unit Lwt.t) option Response.t Monad.t

this way, the user is not exposed to any Lwt_stream.t and can write the each chunk as they desire.

I'm not familiar with Lwt_stream and the objections raised against it in ocsigen/lwt#250 are academic to me. I'm fine either way. I'll make a draft returning a stream, it should be easy to change that part anyhow.

@tmcgilchrist
Copy link
Owner

Thanks for the link to the Lwt_streams flaws thread, interesting read. My expectation for how Lwt_streams should work matches the second model from ocsigen/lwt#250 (comment) so it's interesting that it doesn't necessarily hold.

I would expect the Lwt_stream usage in this library to involve one writer and one reader when downloading/uploading artefacts, so the objections probably don't apply. I think following a model where you use the CLB.to/from_stream functions should work, though I haven't tried writing it myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants