RuntimeError: Given groups=1, weight of size [320, 8, 3, 3], expected input[46, 9, 32, 80] to have 8 channels, but got 9 channels instead #8

redman4585 · 2025-01-07T11:38:21Z

The 1st step of StereoCrafter runs with no problems on my RTX 2060.

The 2nd step gives me this when I try to run inpainting_inference.py:

A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:04<00:00, 1.24it/s]
0%| | 0/8 [00:03<?, ?it/s]
Traceback (most recent call last):
File "inpainting_inference.py", line 297, in
Fire(main)
File "F:\anaconda3\envs\zoe\lib\site-packages\fire\core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "F:\anaconda3\envs\zoe\lib\site-packages\fire\core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "F:\anaconda3\envs\zoe\lib\site-packages\fire\core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "inpainting_inference.py", line 242, in main
video_latents = spatial_tiled_process(
File "inpainting_inference.py", line 77, in spatial_tiled_process
tile = process_func(
File "F:\anaconda3\envs\zoe\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "F:\anaconda3\envs\zoe\StereoCrafter\pipelines\stereo_video_inpainting.py", line 565, in call
noise_pred = self.unet(
File "F:\anaconda3\envs\zoe\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "F:\anaconda3\envs\zoe\lib\site-packages\diffusers\models\unets\unet_spatio_temporal_condition.py", line 428, in forward
sample = self.conv_in(sample)
File "F:\anaconda3\envs\zoe\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "F:\anaconda3\envs\zoe\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "F:\anaconda3\envs\zoe\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [320, 8, 3, 3], expected input[46, 9, 32, 80] to have 8 channels, but got 9 channels instead

The code was tested on Cuda 11.8 with the recommended requirements all installed.

redman4585 · 2025-01-07T21:17:44Z

Update: I fixed it. One of the files got renamed in the weights for StereoCrafter

Archit01 · 2025-01-12T08:44:19Z

Could you please inform if both the steps can be run with 8gb vram. I have rtx 2060 super.

Is there any other steps required to run on 8gb vram gpu ?

Update: I fixed it. One of the files got renamed in the weights for StereoCrafter

redman4585 · 2025-01-12T13:40:04Z

Yeah. I'm using the regular RTX 2060, which has 6GB VRAM

You are limited to certain resolutions though.

1920x1080 content should be limited to 10 seconds for step 1. The higher the fps, the less seconds you need to put in to save VRAM per clip.

Take your splatted video and downscale it to 2048x1024 or 2560x1280 for step 2.

2048x1024 splatted videos should load into your VRAM without x2 tiling.

2560x1280 splatted videos might be possible too. Experiment with certain resolutions and make sure they both perfectly divide into 128.

Archit01 · 2025-01-13T19:46:06Z

Yeah. I'm using the regular RTX 2060, which has 6GB VRAM

You are limited to certain resolutions though.

1920x1080 content should be limited to 10 seconds for step 1. The higher the fps, the less seconds you need to put in to save VRAM per clip.

Take your splatted video and downscale it to 2048x1024 or 2560x1280 for step 2.

2048x1024 splatted videos should load into your VRAM without x2 tiling.

2560x1280 splatted videos might be possible too. Experiment with certain resolutions and make sure they both perfectly divide into 128.

I could not able to run 1080p video in step 1. 720*576p worker for me. Although I have to test more. But step 1 was extremely slow. Is there anything can be done ? 8gb vram 16gb ram. Is there any way to divide the step 1 where we just first create depth map and then splatting. Or use custom depth map ?

Also how to just limit second step into sbs mode only ?

redman4585 · 2025-01-14T00:21:47Z

I reccomend using depth videos already generated by DepthCrafter or Depth Anything. It's the easiest way to get 1080p splatted videos.

I don't know how to have the output just output SBS by itself. I use a script that just outputs the right eye view as a result.

Archit01 · 2025-01-14T04:15:43Z

I reccomend using depth videos already generated by DepthCrafter or Depth Anything. It's the easiest way to get 1080p splatted videos.

I don't know how to have the output just output SBS by itself. I use a script that just outputs the right eye view as a result.

I have done more tests- 1. I could use 1080p video with 2 seconds clip.

Although I have found that when I use lower resolutions like 720*406p or 720p
There is noticeable better depth effect.

Another thing is even if I use 1080p video or high right side view has lack of details whereas
In the camel example image on the GitHub right side still has better details than mine.

Lastly, Could you please guide on how to use already generated depth video and then just do the splatting. What should be the commands and input videos.

xiaoyu258 · 2025-01-14T09:38:39Z

Hi Archit01, maybe you could check this #2 (comment) for using an already generated depth video.

Archit01 · 2025-01-14T09:43:38Z

Hi Archit01, maybe you could check this #2 (comment) for using an already generated depth video.

Thank you. I have checked that comment but as I not much of the technical person so if there is some example on how to execute the commands with proper arguments would be much appreciated. For example if I have 2 video files one is left eye view or orignal video and other is depth map video of the same so now how to execute the same to do splatting
To create the grid.

redman4585 · 2025-01-14T10:11:42Z

Hi Archit01, maybe you could check this #2 (comment) for using an already generated depth video.

Thank you. I have checked that comment but as I not much of the technical person so if there is some example on how to execute the commands with proper arguments would be much appreciated. For example if I have 2 video files one is left eye view or orignal video and other is depth map video of the same so now how to execute the same to do splatting To create the grid.

I use the inference script from here:

https://github.com/enoky/StereoCrafter

Download depth_splatting.py from the files section.

Have your code set up like this:

python depth_splatting.py --input_source_clips <path_to_input_videos> --input_depth_maps <path_to_pre_rendered_depth_maps> --output_splatted <path_to_output_videos> --unet_path <path_to_unet_model> --pre_trained_path <path_to_depthcrafter_model> --max_disp 20 --process_length -1 --batch_size 10

Change the bolded parts to the folder path on your PC.

You can change --max_disp to 30 for stronger 3D strength.

redman4585 · 2025-01-14T12:14:56Z

Make sure the 2d video and the depth video have the same filename, resolution, and fps for it to work.

Archit01 · 2025-01-14T12:26:50Z

Make sure the 2d video and the depth video have the same filename, resolution, and fps for it to work.

Thank you for all the suggestions. I will try it today.

Archit01 · 2025-01-14T18:05:34Z

Thank you got the custom maps working and can now process longer videos at 1080p.

The only thing is left that inpainted right view lacks the details and for unknown reason even with high resolution it can't match the quality of the example shown.

xiaoyu258 mentioned this issue Jan 14, 2025

Low VRAM device #9

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Given groups=1, weight of size [320, 8, 3, 3], expected input[46, 9, 32, 80] to have 8 channels, but got 9 channels instead #8

RuntimeError: Given groups=1, weight of size [320, 8, 3, 3], expected input[46, 9, 32, 80] to have 8 channels, but got 9 channels instead #8

redman4585 commented Jan 7, 2025

redman4585 commented Jan 7, 2025

Archit01 commented Jan 12, 2025

redman4585 commented Jan 12, 2025

Archit01 commented Jan 13, 2025

redman4585 commented Jan 14, 2025 •

edited

Loading

Archit01 commented Jan 14, 2025

xiaoyu258 commented Jan 14, 2025

Archit01 commented Jan 14, 2025

redman4585 commented Jan 14, 2025

redman4585 commented Jan 14, 2025

Archit01 commented Jan 14, 2025

Archit01 commented Jan 14, 2025 •

edited

Loading

RuntimeError: Given groups=1, weight of size [320, 8, 3, 3], expected input[46, 9, 32, 80] to have 8 channels, but got 9 channels instead #8

RuntimeError: Given groups=1, weight of size [320, 8, 3, 3], expected input[46, 9, 32, 80] to have 8 channels, but got 9 channels instead #8

Comments

redman4585 commented Jan 7, 2025

redman4585 commented Jan 7, 2025

Archit01 commented Jan 12, 2025

redman4585 commented Jan 12, 2025

Archit01 commented Jan 13, 2025

redman4585 commented Jan 14, 2025 • edited Loading

Archit01 commented Jan 14, 2025

xiaoyu258 commented Jan 14, 2025

Archit01 commented Jan 14, 2025

redman4585 commented Jan 14, 2025

redman4585 commented Jan 14, 2025

Archit01 commented Jan 14, 2025

Archit01 commented Jan 14, 2025 • edited Loading

redman4585 commented Jan 14, 2025 •

edited

Loading

Archit01 commented Jan 14, 2025 •

edited

Loading