-
Notifications
You must be signed in to change notification settings - Fork 6.1k
[single file] Cosmos #11801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[single file] Cosmos #11801
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
While I'm not an expert of the diffusers code base as far as I can see, based on WAN which also has multiple parameter counts they're just treated as different model types e.g. in if checkpoint[target_key].shape[0] == 1536:
model_type = "wan-t2v-1.3B"
elif checkpoint[target_key].shape[0] == 5120 and checkpoint[target_key].shape[1] == 16:
model_type = "wan-t2v-14B"
else:
model_type = "wan-i2v-14B" |
@a-r-r-o-w I think just run a shape check on the params to determine which config to use. I think this should be sufficient to differentiate? |
@Vargol Could you verify if the latest changes work for you? |
The Comos 2B single file at https://huggingface.co/nvidia/Cosmos-Predict2-2B-Text2Image/resolve/main/model.pt loaded and successfully ran and generated the expected image. I tried a GGUF file for the 14B version and that didn't work. I'm not sure if that was in scope though. If it was $ python cosmos_gguf_prmpts.py
Multiple distributions found for package optimum. Picked distribution: optimum-quanto
WARNING:torchao.kernel.intmm:Warning: Detected no triton, on systems without Triton certain kernels will not work
W0627 23:30:48.574000 85696 lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.
The config attributes {'input_types': ['text'], 'model_size': '14b'} were passed to CosmosTransformer3DModel, but are not expected and will be ignored. Please verify your config.json configuration file.
Traceback (most recent call last):
File "/Volumes/SSD2TB/AI/Diffusers/cosmos_gguf_prmpts.py", line 12, in <module>
transformer = CosmosTransformer3DModel.from_single_file(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/SSD2TB/AI/Diffusers/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/Volumes/SSD2TB/AI/Diffusers/lib/python3.11/site-packages/diffusers/loaders/single_file_model.py", line 420, in from_single_file
load_model_dict_into_meta(
File "/Volumes/SSD2TB/AI/Diffusers/lib/python3.11/site-packages/diffusers/models/model_loading_utils.py", line 285, in load_model_dict_into_meta
hf_quantizer.check_quantized_param_shape(param_name, empty_state_dict[param_name], param)
File "/Volumes/SSD2TB/AI/Diffusers/lib/python3.11/site-packages/diffusers/quantizers/gguf/gguf_quantizer.py", line 84, in check_quantized_param_shape
raise ValueError(
ValueError: patch_embed.proj.weight has an expected quantized shape of: (5120, 68), but received shape: torch.Size([5120, 136])
$ |
Possibly fixes #11798
We can run inference with the 7B Text-to-World model with the following code:
@DN6 I'm not sure I remember how to support different versions of the same model. With the current implementation, if we tried loading the 14B model, it would fail with a weight shape mismatch. This is most likely to do with config-related issues. Could you share some insights?
For Cosmos 1.0 text-to-world and video-to-world models 7B and 14B models, I'll have to make a
cosmos-1.0
entry. Another entrycosmos-2.0
for Cosmos Predict2 models. But, what's the normal process for model of same family but different parameter sizes?