Running on the new dataset #2

Huangmr0719 · 2025-04-15T11:04:58Z

Thank you for your excellent work! I would like to use chapter-llama to conduct experiments on other datasets to explore its generalization performance withtest.py. What pre-processing should I perform on these videos? After that, do I just need to replace the original content with annotations from the new dataset in the config?

The text was updated successfully, but these errors were encountered:

lucas-ventura · 2025-04-15T12:18:44Z

Hi @Huangmr0719 , thank you!

It depends on the dataset, do the videos have ASR? And would you like to run the model with ASR only, captions only, ASR + captions, or even ASR + embeddings (SigLIP) + captions?

If the videos already have ASR, the easiest and fastest way to get started is to run it with ASR only. You can find the relevant commands under Single Video Chaptering 📹 in the Quick Start section. That allows you to test the model on a single video easily.

If you want to use test.py and/or include captions, you’ll need to extract them first. Please check the how-to-extract-video-captions guide for that, and let me know if anything’s unclear.

You’ll also need to create the following files under path/to/dataset/docs/subset/:

your-subset.json: a list of IDs in your dataset
chapters/chapters_your-subset.json: a dictionary with keys as video IDs and values as a dictionary with the video’s duration in seconds under the key "duration". If you have ground truth chapters, you can also include them here for evaluation later.
asrs/asrs_your-subset.json: a dictionary with keys as video IDs and "text", "start", "end" as keys for each ASR segment. These are typically extracted using WhisperX. You can check my inference.py and examples here to get an idea of the format.

Once you have the captions extracted and the files above ready, you can extract the chapters like this:

python test.py subset=your-subset prompt=captions_asr

I don’t have a script ready yet for running a single video with captions as I haven’t had the time, but it’s on my To Do list.

Hope that helps!

Huangmr0719 · 2025-04-15T12:29:49Z

Thank you very much for your answer, it has well answered my question. I believe I need to use the ASR + captions setting on test.py, and I will try to extract the captions and chapters according to your guidance.

Once again, thank you for your answer, and I look forward to your future work!

lucas-ventura · 2025-04-15T12:33:45Z

You're very welcome, glad it helped!

Just a quick tip: before going all in with ASR + captions and test.py, I’d try running the ASR-only setup with inference.py on a single video first. It's a simple way to make sure everything's working fine.

Good luck with the experiments, and let me know if anything comes up!

Huangmr0719 closed this as completed Apr 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running on the new dataset #2

Running on the new dataset #2

Huangmr0719 commented Apr 15, 2025

lucas-ventura commented Apr 15, 2025

Huangmr0719 commented Apr 15, 2025

lucas-ventura commented Apr 15, 2025

Running on the new dataset #2

Running on the new dataset #2

Comments

Huangmr0719 commented Apr 15, 2025

lucas-ventura commented Apr 15, 2025

Huangmr0719 commented Apr 15, 2025

lucas-ventura commented Apr 15, 2025