GitHub - MrXnneHang/auto_labeling_for_BERT_VITS2: 这个项目是数据预处理。第一步是对获取到的音频做处理，结合Funasr的时间戳去掉空背景音。也包含了喂给BERT前的label

ADL

这个功能之后会完全集成在 XnneHangLab 中.

它使用 funasr 生成字幕并且根据每句话的起止点把音频中的空白片段去除, 以 2s 空白音频填充, 防止有的录屏视频中总时长 30 分钟但是讲话的时长实际上不到 5 分钟, 那会在后续的降噪和音频预处理中浪费大量算力和时间.

功能

clip, 根据字幕的起止点切割音频, 并且去除空白片段, 以 2s 空白音频填充.

PS D:\program\auto_labeling_for_BERT_VITS2> uv run label
raw_audio:[WindowsPath('raw_audio/setting.wav')]
请确保待处理的音频已经放在./raw_audio下方y/n:y
 INFO  开始处理
processing raw_audio\setting.wav --------------------------
funasr version: 1.2.6.
rtf_avg: 0.038: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.60s/it]
rtf_avg: 0.010: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 23/23 [00:01<00:00, 17.12it/s] 
rtf_avg: 0.016: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  2.89it/s] 
rtf_avg: 0.011, time_speech:  222.517, time_escape: 2.389: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.56s/it] 
 INFO  最短音频长: 0.24, 最长音频长: 20.855
 INFO
Percentage Distribution:
 INFO    < 2s   : 37.1% (13 items)
 INFO    2-5s   : 25.7% (9 items)
 INFO    5-10s  : 25.7% (9 items)
 INFO    > 10s  : 11.4% (4 items)
是否直接切片? 1: 继续, 2: cut_sentence, 3: combine_sentence, 4: exit:3
你希望的 combine_line 是? (int)毫秒800
你希望的 max_sentence_length 是? (int)个字30
 INFO  最短音频长: 0.995, 最长音频长: 20.855
 INFO
Percentage Distribution:
 INFO    < 2s   : 21.4% (6 items)
 INFO    2-5s   : 21.4% (6 items)
 INFO    5-10s  : 42.9% (12 items)
 INFO    > 10s  : 14.3% (4 items)
是否直接切片? 1: 继续, 2: cut_sentence, 3: combine_sentence, 4: exit:1
处理进度: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:00<00:00, 62.78it/s]
 INFO  All clips were done

cut, 二次识别并且对音频切片, 且对切片后的音频进行识别和标注. (重构中 ....)
标注精修. (这个考虑下,一般来说,目前的 funasr 识别精度已经相当惊人了, 可以不必精修)

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
config		config
dataset		dataset
src/label		src/label
.gitignore		.gitignore
README.md		README.md
hot_words.txt		hot_words.txt
justfile		justfile
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ADL

功能

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

MrXnneHang/auto_labeling_for_BERT_VITS2

Folders and files

Latest commit

History

Repository files navigation

ADL

功能

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages