Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple inference from loading traced pytorch models #978

Open
maxwellflitton opened this issue Sep 27, 2023 · 23 comments
Open

Simple inference from loading traced pytorch models #978

maxwellflitton opened this issue Sep 27, 2023 · 23 comments

Comments

@maxwellflitton
Copy link

Hey, I'm super excited that this library is being built. I've been looking through the documentation and examples and I cannot see any examples of how to load a traced model and perform a simple inference on the loaded model. Is there a way to do this? If not, are there any plans to support such a thing and if so, are there any contribution guidelines?

@EricLBuehler
Copy link
Member

Hi, glad you are interested! I would advise you to see the llama example in candle-examples. If you run cargo run --example llama, it does inference.

@LaurentMazare
Copy link
Collaborator

If the question is more about inference of model exported from PyTorch by tracing in the TorchScript form, we don't support this at the moment. You may already be aware of this but tch-rs does support it with an example here.
In candle, we currently support loading a PyTorch checkpoint and extracting weights out of it but not the actual model architecture etc. If you have some very specific use case and ideally some model files, this would be very interesting to look at though it's probably a pretty large effort to support this (and so unlikely to happen very soon).

@EricLBuehler
Copy link
Member

Yes, perhaps the Tensor type could become a trait and there would be TracedTensor and Tensor? This is an interesting idea.

@LaurentMazare
Copy link
Collaborator

Oh I don't mean we should support tracing in candle (we actually already do it with the graph that is internally maintained), more that we could support executing a model exported from PyTorch in the TorchScript form, no need for a new trait or anything for this it would even more be an external crate rather than done in candle-core.

@EricLBuehler
Copy link
Member

EricLBuehler commented Sep 27, 2023

Ah, ok. Supporting TorchScript might be another candle-xxx crate?

Edit: *might

@maxwellflitton
Copy link
Author

maxwellflitton commented Sep 27, 2023

@LaurentMazare thanks for your reply. I'm working at SurrealDB which is a database written in rust and we're building support for inference for traced models exported in pytorch. We have integrated tch-rs but this requires the torch lib and one of the features of our database is that it can be embedded so if Candle could support model inference from traced pytorch models that would be amazing. I'm happy to pitch in, and my employer might set aside some time for me to contribute to Candle for pytorch inference as surrealDB is open-source itself. If you have any reading materials I can draw up a proposal. @EricLBuehler if another crate is a good idea I'm very willing to contribute

@EricLBuehler
Copy link
Member

EricLBuehler commented Sep 27, 2023

I was just looking at the .pt format: https://pytorch.org/tutorials/beginner/saving_loading_models.html. It looks like it is a serialized dict? I may be wrong.

I think another crate is a good idea because it is likely functionality not everyone needs. I would be happy to create it, but let me know. This could be very interesting - a sort of interpreter that could perhaps be JIT compiled with inkwell.

@LaurentMazare
Copy link
Collaborator

@maxwellflitton just to propose something slightly different: do you have specific reasons to prefer exporting to torchscript rather than exporting to onnx? We've been thinking about having an onnx runtime for candle for a while and this would probably be easier as onnx is well documented format and was designed explicitly for interoperability.

@EricLBuehler
Copy link
Member

EricLBuehler commented Sep 27, 2023

True, and TorchScript can convert to ONNX.

@maxwellflitton
Copy link
Author

@LaurentMazare As long as we can do inference on models alongside neural networks with ONNX I'm down for that. A lot of models in production are random forests and hummingbird has managed to convert sklearn random forests to pytorch models. It's an exciting time to build inference libraries. Because of the embedded nature of the database I really want the ONNX C++ library to be directly fused with the rust binary. Do you think building a rust wrapper for the ONNX library using the cxx crate? If we can take this approach I think I have a strong case for my place of work to allocate resources to it.

@LaurentMazare
Copy link
Collaborator

If you're thinking about the onnxruntime c++ library, there are already pretty good rust bindings for it - and you can probably statically link your binary with it, so no need to use the cxx crate or anything, just a cargo add ort or some equivalent and you should be all good.
I would be biased towards using candle though :) I was more thinking about having an onnx interpreter for candle, the onnx format is specified in protobuf and as long as the ops that are required have a candle equivalent, the interpreter should be pretty straightforward, this would allow for a mix and match approach: the model could be exported with onnx but some pre-processing or post-processing done with the pure rust candle experience.

@EricLBuehler
Copy link
Member

I agree, as it would be more cohesive and we would need a wrapper for ort anyway. @LaurentMazare, do you think this would be better suited as an internal or external crate?

@maxwellflitton
Copy link
Author

@LaurentMazare that would be amazing! Would definitely be up for this as it would really lean into our desire to have pure rust in our database for the embedded and WASM nature. Is there anywhere we can work on a roadmap! I'm going to get the basic C++ working in the database which shouldn't take too long, and then I'm happy to commit to building an onnx interpreter for candle. Is there anywhere we can draft tasks or define a layout?

@LaurentMazare
Copy link
Collaborator

Really up to you, I would suggest creating a specific repo & crate for the candle-onnx interpreter and either check in some design documents in the repo or use the wiki/issues/... not sure what people typically use for this.
As initial steps, I think it would probably be about gathering the protobuf that decribe onnx, set up the build scripts so that the associated rust structures get generated, parse some sample files, sketch some intermediate representation for the onnx compute graph, and have a candle interpreter that runs on top of this. Far from being an expert on onnx so I'm certainly overlooking lots of things :)

@EdorianDark
Copy link

There is already a rust ONNX runtime written in rust: wonnx.
At least the file loading might be used from there.

@EricLBuehler
Copy link
Member

Looks like it is its entity, so its types and API would not match Candle's. Perhaps we could base a custom implementation off of wonnx. @maxwellflitton, what are your thoughts on this vs. implementing from scratch?

@EdorianDark
Copy link

There is also tract, which also can load ONNX files.
It seems to be better seperated into modules, but not well documented.

@maxwellflitton
Copy link
Author

@EdorianDark thanks I checked out the wonnx crate and getting basic inference issues which I've raised here. I'm going to checkout tract. @EricLBuehler I'm going to look into the two crates a bit more. I'm more of a fan of implementing it from scratch but will see what they're both like. Implementing from scratch will take longer but it will reduce the dependencies we have and will not be constrained to another project supporting things.

@LaurentMazare
Copy link
Collaborator

@maxwellflitton did you get some time to start looking at this? If you didn't and think you may not have time soon I may take a stab at it - I hope for it to be actually fairly simple and quite useful to potential users.

@maxwellflitton
Copy link
Author

@LaurentMazare sorry not yet, I've had to get ONNX into SurrealDB somehow for our ML feature so I embedded the C onnx runtime into the rust binary as seen here:

https://github.com/surrealdb/surrealml/blob/main/modules/utils/src/execution/onnx_environment.rs

Once that's done, we and deployed more than happy to get involved. If you're starting, do you have a scoped out project board or anything?

@LaurentMazare
Copy link
Collaborator

Once that's done, we and deployed more than happy to get involved. If you're starting, do you have a scoped out project board or anything?

I haven't started yet - I also haven't planned on having a board or anything, my guess is that it's something like a day or two of work (just supporting the ops that are readily available in candle) so I was thinking that it's more a matter of just trying it out rather than doing some planning ahead.

@LaurentMazare
Copy link
Collaborator

I've merged some preliminary support for onnx, for now it's in candle-onnx in the main repo but I'm on the fence on moving it out, we'll see.
It's very basic at the moment, only allows you to load an onnx file, either print it or do some very naive evaluation with the command below (inputs are all set to 0 in the onnx_basics.rs example but one could easily use it with appropriate values).

cargo run --example onnx_basics -- simple-eval --file myfile.onnx

As next steps, I'll try adding support for a bunch more ops, hopefully enough to run a couple non-trivial model.
Once that's done, the next step would probably be to add a candle-onnx specific computation graph structure so that the model can be pre-processed once and then evaluated multiple times without the overhead of the first initialization.
Also supporting ops in a more composable way, handling opsets, arbitrary values rather than just tensors etc will only be done at this point.

@LaurentMazare
Copy link
Collaborator

Another update, the current implementation seems good enough to run squeezenet so I've made this an example squeezenet-onnx. I'll push on a couple more ops so as to be able to run more models and after that the next step is likely to be adding the intermediary structure for the compute graph, validating the different ops/inputs/outputs properly, and polishing everything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants