Skip to content

Running on the new dataset #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Huangmr0719 opened this issue Apr 15, 2025 · 3 comments
Closed

Running on the new dataset #2

Huangmr0719 opened this issue Apr 15, 2025 · 3 comments

Comments

@Huangmr0719
Copy link

Thank you for your excellent work! I would like to use chapter-llama to conduct experiments on other datasets to explore its generalization performance withtest.py. What pre-processing should I perform on these videos? After that, do I just need to replace the original content with annotations from the new dataset in the config?

@lucas-ventura
Copy link
Owner

Hi @Huangmr0719 , thank you!

It depends on the dataset, do the videos have ASR? And would you like to run the model with ASR only, captions only, ASR + captions, or even ASR + embeddings (SigLIP) + captions?

If the videos already have ASR, the easiest and fastest way to get started is to run it with ASR only. You can find the relevant commands under Single Video Chaptering 📹 in the Quick Start section. That allows you to test the model on a single video easily.

If you want to use test.py and/or include captions, you’ll need to extract them first. Please check the how-to-extract-video-captions guide for that, and let me know if anything’s unclear.

You’ll also need to create the following files under path/to/dataset/docs/subset/:

  • your-subset.json: a list of IDs in your dataset
  • chapters/chapters_your-subset.json: a dictionary with keys as video IDs and values as a dictionary with the video’s duration in seconds under the key "duration". If you have ground truth chapters, you can also include them here for evaluation later.
  • asrs/asrs_your-subset.json: a dictionary with keys as video IDs and "text", "start", "end" as keys for each ASR segment. These are typically extracted using WhisperX. You can check my inference.py and examples here to get an idea of the format.

Once you have the captions extracted and the files above ready, you can extract the chapters like this:

python test.py subset=your-subset prompt=captions_asr

I don’t have a script ready yet for running a single video with captions as I haven’t had the time, but it’s on my To Do list.

Hope that helps!

@Huangmr0719
Copy link
Author

Thank you very much for your answer, it has well answered my question. I believe I need to use the ASR + captions setting on test.py, and I will try to extract the captions and chapters according to your guidance.

Once again, thank you for your answer, and I look forward to your future work!

@lucas-ventura
Copy link
Owner

You're very welcome, glad it helped!

Just a quick tip: before going all in with ASR + captions and test.py, I’d try running the ASR-only setup with inference.py on a single video first. It's a simple way to make sure everything's working fine.

Good luck with the experiments, and let me know if anything comes up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants