Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: speed up by using uv to install python packages #2082

Closed
wants to merge 1 commit into from

Conversation

afuetterer
Copy link
Contributor

What does this PR do?

This is a proposal to use the uv for installing python packages in CI. It concurrently downloads the packages from pypi and improves the time of the install step significantly. From 3+ minutes to 30+ seconds.

What do you think?

Before submitting

  • This PR fixes a typo or improves the docs (if yes, ignore all other checks!).
  • Did you read the contributor guideline?
  • Was this discussed/approved via a Github issue? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes (if applicable)?
  • Did you write any new necessary tests?

@MaartenGr
Copy link
Owner

I love the rustification of all these packages! Haven't used uv myself but this much of a speed up is more than welcome. The only thing I need to be confirmed is that the process of installing BERTopic is technically not the same as through pip. Users are advised using pip directly to install BERTopic which might lead to a different set of dependencies although I doubt this will be the case in practice.

Having said that, when we test using uv but users install using pip are we then sure the use case of users (pip) is properly tested?

@afuetterer
Copy link
Contributor Author

If you look at a car or a bike, rust is usually a bad thing. But in this case, it is really great with the massive performance improvements.

Actually I am not sure about your question. Your package requires pandas>=1.1.5, which is a really old lower bound. If pip and uv use different resolver strategies, it might indeed lead to different versions of e.g. pandas. But is that a big problem?

I don't know, its your call.

If users use uv pip install bertopic, they will have faster install times, even if you don't use it in CI.

@MaartenGr
Copy link
Owner

Actually I am not sure about your question. Your package requires pandas>=1.1.5, which is a really old lower bound. If pip and uv use different resolver strategies, it might indeed lead to different versions of e.g. pandas. But is that a big problem?

It might be a problem if one of those versions results in bugs that we haven't tested through the unit tests. If uv gives a different version of pandas than pip, then it is technically possible that BERTopic installed with pip might be broken somewhere that we haven't detected yet. Over the last years, BERTopic has been broken a couple of times with newer versions of dependencies whether it was due to API changes (even when it weren't major new versions) or due to wheels that were not correctly built.

If users use uv pip install bertopic, they will have faster install times, even if you don't use it in CI.

That's true, the speed-up is definitely there. I am just wondering how many users will actually be doing this. In my experience, >90% of users will just run pip install bertopic also because many users are not that familiar with python.

I'm gonna be honest with you here, I'm not sure what the chance actually is that the testing pipeline using uv is different from a bare pip and the extend to which. That makes it also difficult for me to understand the actual impact.

@afuetterer
Copy link
Contributor Author

Yes, sure, things can break, if the API of a dependency changes, and there is no upper bound in place.

It's your decision. Happy to close this, if not desired. Just wanted to show the difference in installation time.

@MaartenGr
Copy link
Owner

Let's close it for now and re-open when we run into issues with the speed up of the pipeline or if uv becomes more of a staple amongst most users.

@MaartenGr MaartenGr closed this Jul 19, 2024
@afuetterer
Copy link
Contributor Author

Sure, no problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants