🐛[BUG]: RuntimeError: Input type (c10::Half) and bias type (float) should be the same #874

luke-conibear · 2025-05-07T10:02:06Z

Version

Latest from main branch

On which installation method(s) does this occur?

Source

Describe the issue

Following this PR, the CorrDiff example has an error in the generation (see traceback below).

The weights for both regression and diffusion are new following this PR too.

[2025-05-06 17:09:31,605][generate][INFO] - Using dataset: hrrr_mini
[2025-05-06 17:09:48,205][generate][INFO] - Patch-based training disabled
[2025-05-06 17:09:48,205][generate][INFO] - Loading residual network from "/mnt/azureml/cr/j/.../cap/data-capability/wd/INPUT_diffusion_checkpoint_path/EDMPrecondSuperResolution.0.8000000.mdlus"...
[2025-05-06 17:09:49,114][generate][INFO] - Loading network from "/mnt/azureml/cr/j/.../cap/data-capability/wd/INPUT_regression_checkpoint_path/UNet.0.2000128.mdlus"...
[2025-05-06 17:09:49,426][generate][INFO] - Generating images, saving results to /mnt/azureml/cr/j/.../cap/data-capability/wd/output_filename/sample.nc...
[2025-05-06 17:09:50,195][generate][INFO] - starting index: 0
/usr/local/lib/python3.12/dist-packages/physicsnemo/models/diffusion/layers.py:701: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with amp.autocast(enabled=self.amp_mode):
Error executing job with overrides: ['++dataset.data_path=/mnt/azureml/cr/j/.../cap/data-capability/wd/INPUT_data_path/hrrr_mini_train.nc', '++dataset.stats_path=/mnt/azureml/cr/j/.../cap/data-capability/wd/INPUT_stats_path/stats.json', '++generation.io.reg_ckpt_filename=/mnt/azureml/cr/j/.../cap/data-capability/wd/INPUT_regression_checkpoint_path/UNet.0.2000128.mdlus', '++generation.io.res_ckpt_filename=/mnt/azureml/cr/j/.../cap/data-capability/wd/INPUT_diffusion_checkpoint_path/EDMPrecondSuperResolution.0.8000000.mdlus', '++generation.io.output_filename=/mnt/azureml/cr/j/.../cap/data-capability/wd/output_filename/sample.nc']
Traceback (most recent call last):
  File "/mnt/azureml/cr/j/.../exe/wd/generate.py", line 390, in <module>
    main()
  File "/usr/local/lib/python3.12/dist-packages/hydra/main.py", line 94, in decorated_main
    _run_hydra(
  File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 394, in _run_hydra
    _run_app(
  File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 457, in _run_app
    run_and_report(
  File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 223, in run_and_report
    raise ex
  File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 220, in run_and_report
    return func()
           ^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 458, in <lambda>
    lambda: hydra.run(
            ^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
        ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/usr/local/lib/python3.12/dist-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
                       ^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/azureml/cr/j/.../exe/wd/generate.py", line 344, in main
    image_out = generate_fn()
                ^^^^^^^^^^^^^
  File "/mnt/azureml/cr/j/.../exe/wd/generate.py", line 192, in generate_fn
    image_reg = regression_step(
                ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/physicsnemo/utils/corrdiff/utils.py", line 84, in regression_step
    x = net(x=x_hat[0:1], img_lr=img_lr)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/physicsnemo/models/diffusion/unet.py", line 165, in forward
    F_x = self.model(
          ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/physicsnemo/models/diffusion/song_unet.py", line 703, in forward
    return super().forward(x, noise_labels, class_labels, augment_labels)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/physicsnemo/models/diffusion/song_unet.py", line 450, in forward
    x = block(x, emb)
        ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/physicsnemo/models/diffusion/layers.py", line 703, in forward
    x = self.proj(attn.reshape(*x.shape)).add_(x)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/physicsnemo/models/diffusion/layers.py", line 285, in forward
    x = torch.nn.functional.conv2d(x, w, padding=w_pad, bias=b)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Input type (c10::Half) and bias type (float) should be the same

Minimum reproducible example

Default CorrDiff example

whn09 · 2025-05-07T11:27:46Z

Remove with amp.autocast(enabled=self.amp_mode): in layers.py can solve this problem. But maybe not the best solution.

CharlelieLrt · 2025-05-07T18:29:22Z

@luke-conibear thank you for reporting. We are aware of issues with CorrDiff checkpoints, and those will be addressed by #871 once it is merged. For the time being, you can downgrade to the last release 1.0.1-rc until we have a fix.

@loliverhennigh @jialusui1102 for viz

CharlelieLrt · 2025-05-07T20:29:30Z

@luke-conibear after discussion with @jialusui1102 it seems your problem is not due to checkpoints (downgrading to 1.0.1-rc should still fix your issue until we resolve this).

Could please detail how you generated the checkpoints that you want to use in generate.py? Are they trained with the latest train.py and which config file did you use?

Could you also confirm that you are using this config for the generate.py, or if you modifed anything there?

luke-conibear · 2025-05-08T09:43:32Z

@CharlelieLrt this is not using old checkpoints. It is all new runs and checkpoints for all steps.

Yes, I used that exact config for generation. I used the default configs from the main branch without any changes.

The exact commands I submitted were:

Regression

python train.py --config-name=config_training_hrrr_mini_regression.yaml ++dataset.data_path=${{inputs.data_path}} ++dataset.stats_path=${{inputs.stats_path}} ++training.hp.total_batch_size=256 ++training.hp.batch_size_per_gpu=64 ++training.perf.dataloader_workers=1 ++training.io.checkpoint_dir=${{outputs.checkpoint_dir}} ++hydra.run.dir=${{outputs.output_dir}}

Diffusion

python train.py --config-name=config_training_hrrr_mini_diffusion.yaml ++dataset.data_path=${{inputs.data_path}} ++dataset.stats_path=${{inputs.stats_path}} ++training.hp.total_batch_size=256 ++training.hp.batch_size_per_gpu=64 ++training.perf.dataloader_workers=1 ++training.io.regression_checkpoint_path=${{inputs.regression_checkpoint_path}} ++training.io.checkpoint_dir=${{outputs.checkpoint_dir}} ++hydra.run.dir=${{outputs.output_dir}}

Generation

python generate.py --config-name=config_generate_hrrr_mini.yaml ++dataset.data_path=${{inputs.data_path}} ++dataset.stats_path=${{inputs.stats_path}} ++generation.io.reg_ckpt_filename=${{inputs.regression_checkpoint_path}} ++generation.io.res_ckpt_filename=${{inputs.diffusion_checkpoint_path}} ++generation.io.output_filename=${{outputs.output_filename}} ++hydra.run.dir=${{outputs.output_dir}}

Regression run okay in the same time as before the PR.
Diffusion runs, though the non-patched version now takes double the time to complete.
Generation has the error above.

CharlelieLrt · 2025-05-09T17:22:37Z

@luke-conibear thank you for the details.

Generation has the error above.

This was due to keeping AMP enabled in inference, which shouldn't be the case. It should be fixed in #882. Let me know if you still encounter this issue.

Diffusion runs, though the non-patched version now takes double the time to complete.

We were not able to reproduce this. At least the runtime per forward pass that we measured during training is consistent with both the regression model (since both regression and diffusion models share the same architecture, their forward pass runtimes should be comparable), and the diffusion model pre-PR.

Could you please share these details:

Are you referring to overall runtime or runtime per iteration, or only the forward pass?
Which commit are you using as a reference in your "double the time" comparison?
Are you using DDP, and if so how many GPUs are you using?

luke-conibear · 2025-05-13T12:09:21Z

@CharlelieLrt Thanks for the quick response.
Unfortunately, yes the generation RuntimeError issue is still there.

For the timing comment, I was confused in my comparisons. Sorry for wasting time there.

My mistake was that the previous run used 4 GPUs, while the new run used 2 GPUs. So double the time for half the GPUs makes sense.

CharlelieLrt · 2025-05-13T18:46:45Z

@luke-conibear I am not able to reproduce the RuntimeError with the latest commit. To help me troubleshoot this, could you please:

Give me commit hash that you are using for the entire pipeline (i.e. regression training, diffusion training, and generate). Please make sure that you use the same commit for all of them.
Give me the command that you use to run the train.py for both regression and diffusion training, including the config and any hydra override.
The number and type of GPUs that you are using for training regression and diffusion.
That command that you use to run the generate.py, including the config and any hydra override.
The number and types of GPUs that you are using for generation.

luke-conibear · 2025-05-13T22:03:28Z

Thanks for your help

I used this recent commit for all steps

Commands

# Regression
torchrun --standalone --nnodes=1 --nproc_per_node=2 train.py --config-name=config_training_hrrr_mini_regression.yaml model=regression ++dataset.data_path=${{inputs.data_path}} ++dataset.stats_path=${{inputs.stats_path}} ++training.hp.total_batch_size=2560 ++training.hp.batch_size_per_gpu=640 ++training.perf.dataloader_workers=1 ++training.io.checkpoint_dir=${{outputs.checkpoint_dir}} ++hydra.run.dir=${{outputs.output_dir}}

# Diffusion
torchrun --standalone --nnodes=1 --nproc_per_node=2 train.py --config-name=config_training_hrrr_mini_diffusion.yaml model=diffusion ++dataset.data_path=${{inputs.data_path}} ++dataset.stats_path=${{inputs.stats_path}} ++training.hp.total_batch_size=2560 ++training.hp.batch_size_per_gpu=640 ++training.perf.dataloader_workers=1 ++training.io.regression_checkpoint_path=${{inputs.regression_checkpoint_path}} ++training.io.checkpoint_dir=${{outputs.checkpoint_dir}} ++hydra.run.dir=${{outputs.output_dir}}

# Generation
python generate.py --config-name=config_generate_hrrr_mini.yaml generation=non_patched ++dataset.data_path=${{inputs.data_path}} ++dataset.stats_path=${{inputs.stats_path}} ++generation.io.reg_ckpt_filename=${{inputs.regression_checkpoint_path}} ++generation.io.res_ckpt_filename=${{inputs.diffusion_checkpoint_path}} ++generation.io.output_filename=${{outputs.output_filename}} ++hydra.run.dir=${{outputs.output_dir}} ++generation.has_lead_time=False ++generation.num_ensembles=2 ++generation.times=['2020-02-02T00:00:00']

Configs are default ones without any changes
All steps use Standard_NC80adis_H100_v5 on Azure ML. 2x GPUs for regression and diffusion. 1x GPU for generation.

luke-conibear · 2025-05-13T22:09:58Z

The above information is for non-patched diffusion, as I cannot get the patched version to work.

I've tried many config/hydra variants e.g., appending to the command

f"model=patched_diffusion ++training.hp.patch_shape_x={patch_shape_x} ++training.hp.patch_shape_y={patch_shape_y} ++training.hp.patch_num={patch_num} "

Though always get in the logs

Patch-based training disabled

CharlelieLrt · 2025-05-13T22:29:40Z

@luke-conibear thank you for the details, we will try to reproduce your error with the generate.py.

The above information is for non-patched diffusion, as I cannot get the patched version to work.

I've tried the exact command that you provided with the commit that you linked and the patch-based diffusion training works without problem for me. What values did you use for patch_shape_x and patch_shape_y? I suspect that you used values >= 64? FYI, the HRRR-mini dataset has images that are 64x64, so if you request patches that are greater or equal than 64, the patched training will be automatically disabled.

Note 1: currently patch_shape_x and patch_shape_y also need to be multiple of 32, so the only option to have patch-based diffusion training on HRRR-mini is to set patch_shape_x = 32 and patch_shape_y = 32

Note 2: patched-based training is designed for much larger images. It should still work on the HRRR-mini dataset, but it is not the most relevant application of patch-based diffusion.

luke-conibear · 2025-05-14T14:13:10Z

@CharlelieLrt Okay, great, thanks a lot for the help.

Yes, you're right about the patch shape. I used 32 and patched diffusion works.

Then generation for patched diffusion has the same error as for non-patched.
Traceback below:

/usr/local/lib/python3.12/dist-packages/physicsnemo/utils/filesystem.py:75: SyntaxWarning: invalid escape sequence '\w'
  pattern = re.compile(f"{suffix}[\w-]+(/[\w-]+)?/[\w-]+@[A-Za-z0-9.]+/[\w/](.*)")
/usr/local/lib/python3.12/dist-packages/physicsnemo/launch/logging/launch.py:321: SyntaxWarning: invalid escape sequence '\.'
  key = re.sub("[^a-zA-Z0-9\.\-\s\/\_]+", "", key)
/usr/local/lib/python3.12/dist-packages/physicsnemo/utils/generative/deterministic_sampler.py:53: SyntaxWarning: invalid escape sequence '\s'
  """
/usr/local/lib/python3.12/dist-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'config_generate_hrrr_mini.yaml': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information
  warnings.warn(msg, UserWarning)
/usr/local/lib/python3.12/dist-packages/physicsnemo/distributed/manager.py:415: UserWarning: Could not initialize using ENV, SLURM or OPENMPI methods. Assuming this is a single process job
  warn(
[2025-05-14 14:05:47,457][generate][INFO] - Using dataset: hrrr_mini
[2025-05-14 14:06:04,172][generate][INFO] - Patch-based training enabled
[2025-05-14 14:06:04,172][generate][INFO] - Loading residual network from "/mnt/azureml/cr/j/.../cap/data-capability/wd/INPUT_diffusion_checkpoint_path/EDMPrecondSuperResolution.0.8000000.mdlus"...
[2025-05-14 14:06:04,955][generate][INFO] - Loading network from "/mnt/azureml/cr/j/.../cap/data-capability/wd/INPUT_regression_checkpoint_path/UNet.0.2001920.mdlus"...
[2025-05-14 14:06:05,240][generate][INFO] - Generating images, saving results to /mnt/azureml/cr/j/.../cap/data-capability/wd/output_filename/sample.nc...
[2025-05-14 14:06:06,021][generate][INFO] - starting index: 0
/usr/local/lib/python3.12/dist-packages/physicsnemo/models/diffusion/layers.py:701: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with amp.autocast(enabled=self.amp_mode):
Error executing job with overrides: ['generation=patched', '++dataset.data_path=/mnt/azureml/cr/j/.../cap/data-capability/wd/INPUT_data_path/hrrr_mini_train.nc', '++dataset.stats_path=/mnt/azureml/cr/j/.../cap/data-capability/wd/INPUT_stats_path/stats.json', '++generation.io.reg_ckpt_filename=/mnt/azureml/cr/j/.../cap/data-capability/wd/INPUT_regression_checkpoint_path/UNet.0.2001920.mdlus', '++generation.io.res_ckpt_filename=/mnt/azureml/cr/j/.../cap/data-capability/wd/INPUT_diffusion_checkpoint_path/EDMPrecondSuperResolution.0.8000000.mdlus', '++generation.io.output_filename=/mnt/azureml/cr/j/.../cap/data-capability/wd/output_filename/sample.nc', '++generation.has_lead_time=False', '++generation.num_ensembles=2', '++generation.times=[2020-02-02T00:00:00]', '++generation.patch_shape_x=32', '++generation.patch_shape_y=32']
Traceback (most recent call last):
  File "/mnt/azureml/cr/j/.../exe/wd/generate.py", line 396, in <module>
    main()
  File "/usr/local/lib/python3.12/dist-packages/hydra/main.py", line 94, in decorated_main
    _run_hydra(
  File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 394, in _run_hydra
    _run_app(
  File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 457, in _run_app
    run_and_report(
  File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 223, in run_and_report
    raise ex
  File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 220, in run_and_report
    return func()
           ^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 458, in <lambda>
    lambda: hydra.run(
            ^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
        ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/usr/local/lib/python3.12/dist-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
                       ^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/azureml/cr/j/.../exe/wd/generate.py", line 350, in main
    image_out = generate_fn()
                ^^^^^^^^^^^^^
  File "/mnt/azureml/cr/j/.../exe/wd/generate.py", line 198, in generate_fn
    image_reg = regression_step(
                ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/physicsnemo/utils/corrdiff/utils.py", line 84, in regression_step
    x = net(x=x_hat[0:1], img_lr=img_lr)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/physicsnemo/models/diffusion/unet.py", line 165, in forward
    F_x = self.model(
          ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/physicsnemo/models/diffusion/song_unet.py", line 703, in forward
    return super().forward(x, noise_labels, class_labels, augment_labels)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/physicsnemo/models/diffusion/song_unet.py", line 450, in forward
    x = block(x, emb)
        ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/physicsnemo/models/diffusion/layers.py", line 703, in forward
    x = self.proj(attn.reshape(*x.shape)).add_(x)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/physicsnemo/models/diffusion/layers.py", line 285, in forward
    x = torch.nn.functional.conv2d(x, w, padding=w_pad, bias=b)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Input type (c10::Half) and bias type (float) should be the same

CharlelieLrt · 2025-05-15T02:27:59Z

Yes, you're right about the patch shape. I used 32 and patched diffusion works.

Great to know! We will update the log messages to more clearly explain why patching is disabled in this case.

Regarding your runtime error in generate.py @jialusui1102 identified the source of the problem (we were not properly disabling AMP in the models).

Both will be fixed once #885 is merged.

luke-conibear · 2025-05-27T13:10:00Z

@CharlelieLrt Thanks a lot for the great help here. I confirm this is fixed.

luke-conibear added bug Something isn't working ? - Needs Triage Need team to review and classify labels May 7, 2025

whn09 pushed a commit to whn09/physicsnemo that referenced this issue May 7, 2025

remove autocast to fix NVIDIA#874

cdaf498

CharlelieLrt self-assigned this May 7, 2025

CharlelieLrt added 2 - In Progress Currently a work in progress and removed ? - Needs Triage Need team to review and classify labels May 7, 2025

CharlelieLrt mentioned this issue May 9, 2025

CorrDiff: inference bugfixes, cleanup, and documentation improvements #882

Merged

5 tasks

CharlelieLrt mentioned this issue May 14, 2025

Enable saving/loading Module compiled with torch dynamo and dynamically set AMP in CorrDiff Module wrappers #885

Merged

5 tasks

CharlelieLrt closed this as completed in #885 May 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🐛[BUG]: RuntimeError: Input type (c10::Half) and bias type (float) should be the same #874

🐛[BUG]: RuntimeError: Input type (c10::Half) and bias type (float) should be the same #874

luke-conibear commented May 7, 2025 •

edited

Loading

whn09 commented May 7, 2025

Uh oh!

CharlelieLrt commented May 7, 2025 •

edited

Loading

Uh oh!

CharlelieLrt commented May 7, 2025 •

edited

Loading

Uh oh!

luke-conibear commented May 8, 2025

Uh oh!

CharlelieLrt commented May 9, 2025 •

edited

Loading

Uh oh!

luke-conibear commented May 13, 2025

Uh oh!

CharlelieLrt commented May 13, 2025 •

edited

Loading

Uh oh!

luke-conibear commented May 13, 2025

Uh oh!

luke-conibear commented May 13, 2025

Uh oh!

CharlelieLrt commented May 13, 2025 •

edited

Loading

Uh oh!

luke-conibear commented May 14, 2025

Uh oh!

CharlelieLrt commented May 15, 2025

Uh oh!

luke-conibear commented May 27, 2025

Uh oh!

🐛[BUG]: RuntimeError: Input type (c10::Half) and bias type (float) should be the same #874

🐛[BUG]: RuntimeError: Input type (c10::Half) and bias type (float) should be the same #874

Comments

luke-conibear commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Version

On which installation method(s) does this occur?

Describe the issue

Minimum reproducible example

whn09 commented May 7, 2025

Uh oh!

CharlelieLrt commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CharlelieLrt commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

luke-conibear commented May 8, 2025

Uh oh!

CharlelieLrt commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

luke-conibear commented May 13, 2025

Uh oh!

CharlelieLrt commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

luke-conibear commented May 13, 2025

Uh oh!

luke-conibear commented May 13, 2025

Uh oh!

CharlelieLrt commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

luke-conibear commented May 14, 2025

Uh oh!

CharlelieLrt commented May 15, 2025

Uh oh!

luke-conibear commented May 27, 2025

Uh oh!

luke-conibear commented May 7, 2025 •

edited

Loading

CharlelieLrt commented May 7, 2025 •

edited

Loading

CharlelieLrt commented May 7, 2025 •

edited

Loading

CharlelieLrt commented May 9, 2025 •

edited

Loading

CharlelieLrt commented May 13, 2025 •

edited

Loading

CharlelieLrt commented May 13, 2025 •

edited

Loading