use case of multi-GPU sharding, nnx.jit, save/load and performances #4575

jecampagne · 2025-02-25T17:09:18Z

jecampagne
Feb 25, 2025

Hello,
I have a toy-example of a model to use Yong Song denoising sampling, that I am writing in different conditions to experience the FLAX NXX & Obax libs. To discuss & ask few questions I've setup this notebook on Colab just to read the code.

My first question concerns the training section where I'm looping on epochs:
What about tthe necessity or not of the key_scorenet key?

for epoch in range(1, args['epochs'] + 1):
    loss_mean = 0.
    for batch in range(batches_in_epoch):
      # Generate RNG keys for denoiser
      key, key_scorenet = jax.random.split(key, 2)
      #key_scorenet =  jax.device_put(key_scorenet, data_sharding) # necessary???

      # Shard the data to possible devices.
      batch_data = next(data_gen)
      batch_data = jax.device_put(batch_data, data_sharding)
     
      # Take a step with the denoiser.
      loss =  train_step(model, optimizer, 
                        batch_data, key_scorenet)
....

My second question concern the train_step where I do not figure if I have done correctly

@nnx.jit
def train_step(model: ScoreNet,
                optimizer: nnx.Optimizer,
                x: jnp.ndarray,
                key: PRNGKey):
...
  # Strange perturbed_x is shared as x but random_t is shareded like "GPU0" | "GPU1"
  jax.debug.visualize_array_sharding(perturbed_x)
  jax.debug.visualize_array_sharding(random_t)

I observe that the sharding of perturbed_x is the same as x the data, but random_t which is
the second argument or the model call, it looks different : horizontal "GPU0" seperated to "GPU1", so I wander
if it is correct.

The third question Now the loss

      loss =  train_step(model, optimizer, 
                        batch_data, key_scorenet)

is a scalar, so I do not know if it is the loss of the mean on all the models ???

One question concerns the loading of the model (ie. Sampling section) as I have a strange bahavior
as I have to use succesively the two state_restored statement to get loaded the model???

After the sampling looks ok.

now the nnx.jit is quite slown and I wander how to adapt the performance trics

Thanks for your attention.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use case of multi-GPU sharding, nnx.jit, save/load and performances #4575

{{title}}

Replies: 0 comments

Select a reply

use case of multi-GPU sharding, nnx.jit, save/load and performances #4575

jecampagne Feb 25, 2025

Replies: 0 comments

jecampagne
Feb 25, 2025