Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Given groups=1, weight of size [320, 8, 3, 3], expected input[46, 9, 32, 80] to have 8 channels, but got 9 channels instead #8

Open
redman4585 opened this issue Jan 7, 2025 · 12 comments

Comments

@redman4585
Copy link

The 1st step of StereoCrafter runs with no problems on my RTX 2060.

The 2nd step gives me this when I try to run inpainting_inference.py:

A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:04<00:00, 1.24it/s]
0%| | 0/8 [00:03<?, ?it/s]
Traceback (most recent call last):
File "inpainting_inference.py", line 297, in
Fire(main)
File "F:\anaconda3\envs\zoe\lib\site-packages\fire\core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "F:\anaconda3\envs\zoe\lib\site-packages\fire\core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "F:\anaconda3\envs\zoe\lib\site-packages\fire\core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "inpainting_inference.py", line 242, in main
video_latents = spatial_tiled_process(
File "inpainting_inference.py", line 77, in spatial_tiled_process
tile = process_func(
File "F:\anaconda3\envs\zoe\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "F:\anaconda3\envs\zoe\StereoCrafter\pipelines\stereo_video_inpainting.py", line 565, in call
noise_pred = self.unet(
File "F:\anaconda3\envs\zoe\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "F:\anaconda3\envs\zoe\lib\site-packages\diffusers\models\unets\unet_spatio_temporal_condition.py", line 428, in forward
sample = self.conv_in(sample)
File "F:\anaconda3\envs\zoe\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "F:\anaconda3\envs\zoe\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "F:\anaconda3\envs\zoe\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [320, 8, 3, 3], expected input[46, 9, 32, 80] to have 8 channels, but got 9 channels instead

The code was tested on Cuda 11.8 with the recommended requirements all installed.

@redman4585
Copy link
Author

Update: I fixed it. One of the files got renamed in the weights for StereoCrafter

@Archit01
Copy link

Could you please inform if both the steps can be run with 8gb vram. I have rtx 2060 super.

Is there any other steps required to run on 8gb vram gpu ?

Update: I fixed it. One of the files got renamed in the weights for StereoCrafter

@redman4585
Copy link
Author

Yeah. I'm using the regular RTX 2060, which has 6GB VRAM

You are limited to certain resolutions though.

1920x1080 content should be limited to 10 seconds for step 1. The higher the fps, the less seconds you need to put in to save VRAM per clip.

Take your splatted video and downscale it to 2048x1024 or 2560x1280 for step 2.

2048x1024 splatted videos should load into your VRAM without x2 tiling.

2560x1280 splatted videos might be possible too. Experiment with certain resolutions and make sure they both perfectly divide into 128.

@Archit01
Copy link

Yeah. I'm using the regular RTX 2060, which has 6GB VRAM

You are limited to certain resolutions though.

1920x1080 content should be limited to 10 seconds for step 1. The higher the fps, the less seconds you need to put in to save VRAM per clip.

Take your splatted video and downscale it to 2048x1024 or 2560x1280 for step 2.

2048x1024 splatted videos should load into your VRAM without x2 tiling.

2560x1280 splatted videos might be possible too. Experiment with certain resolutions and make sure they both perfectly divide into 128.

I could not able to run 1080p video in step 1. 720*576p worker for me. Although I have to test more. But step 1 was extremely slow. Is there anything can be done ? 8gb vram 16gb ram. Is there any way to divide the step 1 where we just first create depth map and then splatting. Or use custom depth map ?

Also how to just limit second step into sbs mode only ?

@redman4585
Copy link
Author

redman4585 commented Jan 14, 2025

I reccomend using depth videos already generated by DepthCrafter or Depth Anything. It's the easiest way to get 1080p splatted videos.

I don't know how to have the output just output SBS by itself. I use a script that just outputs the right eye view as a result.

@Archit01
Copy link

I reccomend using depth videos already generated by DepthCrafter or Depth Anything. It's the easiest way to get 1080p splatted videos.

I don't know how to have the output just output SBS by itself. I use a script that just outputs the right eye view as a result.

I have done more tests- 1. I could use 1080p video with 2 seconds clip.

Although I have found that when I use lower resolutions like 720*406p or 720p
There is noticeable better depth effect.

Another thing is even if I use 1080p video or high right side view has lack of details whereas
In the camel example image on the GitHub right side still has better details than mine.

Lastly, Could you please guide on how to use already generated depth video and then just do the splatting. What should be the commands and input videos.

@xiaoyu258
Copy link
Contributor

Hi Archit01, maybe you could check this #2 (comment) for using an already generated depth video.

@Archit01
Copy link

Hi Archit01, maybe you could check this #2 (comment) for using an already generated depth video.

Thank you. I have checked that comment but as I not much of the technical person so if there is some example on how to execute the commands with proper arguments would be much appreciated. For example if I have 2 video files one is left eye view or orignal video and other is depth map video of the same so now how to execute the same to do splatting
To create the grid.

@redman4585
Copy link
Author

Hi Archit01, maybe you could check this #2 (comment) for using an already generated depth video.

Thank you. I have checked that comment but as I not much of the technical person so if there is some example on how to execute the commands with proper arguments would be much appreciated. For example if I have 2 video files one is left eye view or orignal video and other is depth map video of the same so now how to execute the same to do splatting To create the grid.

I use the inference script from here:

https://github.com/enoky/StereoCrafter

Download depth_splatting.py from the files section.

Have your code set up like this:

python depth_splatting.py --input_source_clips <path_to_input_videos> --input_depth_maps <path_to_pre_rendered_depth_maps> --output_splatted <path_to_output_videos> --unet_path <path_to_unet_model> --pre_trained_path <path_to_depthcrafter_model> --max_disp 20 --process_length -1 --batch_size 10

Change the bolded parts to the folder path on your PC.

You can change --max_disp to 30 for stronger 3D strength.

@redman4585
Copy link
Author

Make sure the 2d video and the depth video have the same filename, resolution, and fps for it to work.

@Archit01
Copy link

Make sure the 2d video and the depth video have the same filename, resolution, and fps for it to work.

Thank you for all the suggestions. I will try it today.

@Archit01
Copy link

Archit01 commented Jan 14, 2025

Thank you got the custom maps working and can now process longer videos at 1080p.

The only thing is left that inpainted right view lacks the details and for unknown reason even with high resolution it can't match the quality of the example shown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants