Pickling a trained model (NNX) #4247

eugene · 2024-10-03T13:41:32Z

I train small models and while prototyping and testing I wish to store trained models in a simple way (also having the model configuration and training-stats inside the model object as dicts/arrays). Idially i want to deal with a single object and being able to simply:

with open('my-model.pk', 'wb') as file:
    pickle.dump(model, file)

and later:

model = pickle.load('my-model.pk')

Similar to torch.save.

When i naively try to do the above I get:

----> 2     pickle.dump(model, file)
AttributeError: Can't pickle local object 'variance_scaling.<locals>.init'

Splitting into graphdef and state results in the following error:

cannot pickle 'PyTreeDef' object

While I am aware there is orbax and it might save the state, I really wish it would be possible to avoid that dependency and keep things simple. Is there a trick or a workaround I can use to achieve the desired functionality?

The text was updated successfully, but these errors were encountered:

cgarciae · 2024-10-03T13:55:36Z

Hey @eugene, can you try using cloudpickle? Usually has better results.

eugene · 2024-10-03T14:15:53Z

Hey @cgarciae , thanks for swift reply!

It might work, but still an extra dependency. It feels like "wholesome" save/load functionality is so fundamental, it should be added to NNX itself eg. nnx.save(...) / nnx.load(...). For now I guess I will stick to saving attributes individually and combining them again upon loading:

Saving:

with open('model.pickle', 'wb') as file:
    pickle.dump({
        'opts':   model.opts, 
        'stats':  model.stats,
        'state':  nnx.state(model)
    }, file)

Loading:

with open('model.pickle', 'rb') as file:
    model_dict = pickle.load(file)

model = Model(..., rngs=nnx.Rngs(0))
model.opts = model_dict['opts']
model.stats = model_dict['stats']
nnx.update(model, model_dict['state'])

cgarciae · 2024-10-04T09:19:42Z

The only problem with such a simple method (which we could add) is that its only good for local single hosts setups. In general its recommended to use orbax for checkpointing. Here is a simple NNX example: 08_save_load_checkpoints.py

eugene · 2024-10-04T09:46:45Z

@cgarciae I get your point. But even with orbax, the attributes of the model are still stored saparately. Let's take a step back. The very premise of Flax NNX (and this is the first paragraph of the introduction to documentation, emphasis mine):

Flax NNX is a new simplified API that is designed to make it easier to create, inspect, debug, and analyze neural networks in JAX. It achieves this by adding first class support for Python reference semantics. This allows users to express their models using regular Python objects, which are modeled as PyGraphs (instead of pytrees), enabling reference sharing and mutability. Such API design should make PyTorch or Keras users feel at home.

So, as you have seen, I tried expressing my model as regular Python object and Flax behaved unexpectedly when I tried to save it.

Also, while multi-host scanario is important, I would bet that the vast majority of the users (especially researchers like myself) work on a single-host. While protototyping and benchmarking saving a quick model (self contained in a regular Python object with arbitrary attributes) makes a lot of sense and I would love to have that supported out of the box.

... and the more advanced multi-host scenario, can be handled by orbax.

cgarciae · 2024-10-04T12:34:27Z

@eugene I agree we want to be friendly with pickle / cloudpickle. I think we can commit to making NNX compatible with cloudpickle, I wouldn't recommend using pickle for these types of tasks as it cannot handle lambdas and similar objects.

I'll add a simple test for cloudpickle to begin.

eugene · 2024-10-04T12:43:19Z

@cgarciae thats a step in the right direction, but I would love to hear your deeper considerations. To require cloudpickle, a separate dependency just to save, what suppose to be your everyday "regular Python object", seems to be a convoluted solution. Why wouldnt you want a nnx.save() and nnx.load() methods mimiking the API design of PyTorch (and by extension, complying to the Flax promise citet above, where those users should feel at home)?

cgarciae · 2024-10-04T16:57:34Z

I've created #4253 adding support for cloudpickle.

For simple use cases maybe we could add a save_state / load_state API:

# save
nnx.save_state(nnx.state(model), 'model.ckp')
# load
model = Model(...)
nnx.update(model, nnx.load_state('model.ckp'))

kelechi-c · 2024-11-23T09:28:22Z

For simple use cases maybe we could add a save_state / load_state API:
# save
nnx.save_state(nnx.state(model), 'model.ckp')
# load
model = Model(...)
nnx.update(model, nnx.load_state('model.ckp'))

Please is the nnx.save_state API enabled now @cgarciae ?

cgarciae · 2024-11-23T10:52:15Z

@kelechi-c JAX team is cooking something similar so I'm just going to wait.

kelechi-c · 2024-11-23T10:53:44Z

@cgarciae Thanks! And thanks for all the work on NNX 🫡

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pickling a trained model (NNX) #4247

Pickling a trained model (NNX) #4247

eugene commented Oct 3, 2024

cgarciae commented Oct 3, 2024

eugene commented Oct 3, 2024 •

edited

Loading

cgarciae commented Oct 4, 2024

eugene commented Oct 4, 2024 •

edited

Loading

cgarciae commented Oct 4, 2024

eugene commented Oct 4, 2024 •

edited

Loading

cgarciae commented Oct 4, 2024 •

edited

Loading

kelechi-c commented Nov 23, 2024

cgarciae commented Nov 23, 2024

kelechi-c commented Nov 23, 2024 •

edited

Loading

Pickling a trained model (NNX) #4247

Pickling a trained model (NNX) #4247

Comments

eugene commented Oct 3, 2024

cgarciae commented Oct 3, 2024

eugene commented Oct 3, 2024 • edited Loading

cgarciae commented Oct 4, 2024

eugene commented Oct 4, 2024 • edited Loading

cgarciae commented Oct 4, 2024

eugene commented Oct 4, 2024 • edited Loading

cgarciae commented Oct 4, 2024 • edited Loading

kelechi-c commented Nov 23, 2024

cgarciae commented Nov 23, 2024

kelechi-c commented Nov 23, 2024 • edited Loading

eugene commented Oct 3, 2024 •

edited

Loading

eugene commented Oct 4, 2024 •

edited

Loading

eugene commented Oct 4, 2024 •

edited

Loading

cgarciae commented Oct 4, 2024 •

edited

Loading

kelechi-c commented Nov 23, 2024 •

edited

Loading