[single file] Cosmos #11801

a-r-r-o-w · 2025-06-24T20:25:23Z

Possibly fixes #11798

We can run inference with the 7B Text-to-World model with the following code:

import torch
from diffusers import CosmosTextToWorldPipeline, CosmosTransformer3DModel
from diffusers.utils import export_to_video

model_id = "nvidia/Cosmos-1.0-Diffusion-7B-Text2World"
transformer_single_file = "https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-7B-Text2World/blob/main/model.pt"

transformer = CosmosTransformer3DModel.from_single_file(transformer_single_file, torch_dtype=torch.bfloat16).to("cuda")
pipe = CosmosTextToWorldPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.bfloat16)
pipe.to("cuda")

prompt = "A sleek, humanoid robot stands in a vast warehouse filled with neatly stacked cardboard boxes on industrial shelves. The robot's metallic body gleams under the bright, even lighting, highlighting its futuristic design and intricate joints. A glowing blue light emanates from its chest, adding a touch of advanced technology. The background is dominated by rows of boxes, suggesting a highly organized storage system. The floor is lined with wooden pallets, enhancing the industrial setting. The camera remains static, capturing the robot's poised stance amidst the orderly environment, with a shallow depth of field that keeps the focus on the robot while subtly blurring the background for a cinematic effect."

output = pipe(prompt=prompt).frames[0]
export_to_video(output, "output.mp4", fps=30)

@DN6 I'm not sure I remember how to support different versions of the same model. With the current implementation, if we tried loading the 14B model, it would fail with a weight shape mismatch. This is most likely to do with config-related issues. Could you share some insights?

For Cosmos 1.0 text-to-world and video-to-world models 7B and 14B models, I'll have to make a cosmos-1.0 entry. Another entry cosmos-2.0 for Cosmos Predict2 models. But, what's the normal process for model of same family but different parameter sizes?

HuggingFaceDocBuilderDev · 2025-06-24T20:32:31Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Vargol · 2025-06-26T10:12:15Z

While I'm not an expert of the diffusers code base as far as I can see, based on WAN which also has multiple parameter counts they're just treated as different model types e.g. in src/diffusers/loaders/single_file_utils.py

        if checkpoint[target_key].shape[0] == 1536:
            model_type = "wan-t2v-1.3B"
        elif checkpoint[target_key].shape[0] == 5120 and checkpoint[target_key].shape[1] == 16:
            model_type = "wan-t2v-14B"
        else:
            model_type = "wan-i2v-14B"

DN6 · 2025-06-27T09:52:24Z

@a-r-r-o-w I think just run a shape check on the params to determine which config to use. I think this should be sufficient to differentiate?

a-r-r-o-w · 2025-06-27T20:04:47Z

@Vargol Could you verify if the latest changes work for you?

Vargol · 2025-06-27T22:35:12Z

The Comos 2B single file at https://huggingface.co/nvidia/Cosmos-Predict2-2B-Text2Image/resolve/main/model.pt loaded and successfully ran and generated the expected image.

I tried a GGUF file for the 14B version and that didn't work. I'm not sure if that was in scope though. If it was
the error is..

$ python cosmos_gguf_prmpts.py 
Multiple distributions found for package optimum. Picked distribution: optimum-quanto
WARNING:torchao.kernel.intmm:Warning: Detected no triton, on systems without Triton certain kernels will not work
W0627 23:30:48.574000 85696 lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.
The config attributes {'input_types': ['text'], 'model_size': '14b'} were passed to CosmosTransformer3DModel, but are not expected and will be ignored. Please verify your config.json configuration file.
Traceback (most recent call last):
  File "/Volumes/SSD2TB/AI/Diffusers/cosmos_gguf_prmpts.py", line 12, in <module>
    transformer = CosmosTransformer3DModel.from_single_file(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/SSD2TB/AI/Diffusers/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/Volumes/SSD2TB/AI/Diffusers/lib/python3.11/site-packages/diffusers/loaders/single_file_model.py", line 420, in from_single_file
    load_model_dict_into_meta(
  File "/Volumes/SSD2TB/AI/Diffusers/lib/python3.11/site-packages/diffusers/models/model_loading_utils.py", line 285, in load_model_dict_into_meta
    hf_quantizer.check_quantized_param_shape(param_name, empty_state_dict[param_name], param)
  File "/Volumes/SSD2TB/AI/Diffusers/lib/python3.11/site-packages/diffusers/quantizers/gguf/gguf_quantizer.py", line 84, in check_quantized_param_shape
    raise ValueError(
ValueError: patch_embed.proj.weight has an expected quantized shape of: (5120, 68), but received shape: torch.Size([5120, 136])
$

update

05d24af

a-r-r-o-w requested a review from DN6 June 24, 2025 20:25

a-r-r-o-w added 3 commits June 27, 2025 16:49

Merge branch 'main' into single-file/cosmos

82dc0f9

Merge branch 'main' into single-file/cosmos

d964cc8

update

37c625b

a-r-r-o-w marked this pull request as ready for review June 27, 2025 20:04

update docs

a47184d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[single file] Cosmos #11801

[single file] Cosmos #11801

Uh oh!

a-r-r-o-w commented Jun 24, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jun 24, 2025

Uh oh!

Vargol commented Jun 26, 2025

Uh oh!

DN6 commented Jun 27, 2025

Uh oh!

a-r-r-o-w commented Jun 27, 2025

Uh oh!

Vargol commented Jun 27, 2025

Uh oh!

Uh oh!

[single file] Cosmos #11801

Are you sure you want to change the base?

[single file] Cosmos #11801

Uh oh!

Conversation

a-r-r-o-w commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jun 24, 2025

Uh oh!

Vargol commented Jun 26, 2025

Uh oh!

DN6 commented Jun 27, 2025

Uh oh!

a-r-r-o-w commented Jun 27, 2025

Uh oh!

Vargol commented Jun 27, 2025

Uh oh!

Uh oh!

a-r-r-o-w commented Jun 24, 2025 •

edited

Loading