Skip to content

yaoxunji/gen-se

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
The official implementation of GenSE (ICLR 2025)

We propose a comprehensive framework tailored for language model-based speech enhancement, called GenSE. Speech enhancement is regarded as a conditional language modeling task rather than a continuous signal regression problem defined in existing works. This is achieved by tokenizing speech signals into semantic tokens using a pre-trained self-supervised model and into acoustic tokens using a custom-designed single-quantizer neural codec model.

GenSE employs a hierarchical modeling framework with a two-stage process: a N2S transformation front-end, which converts noisy speech into clean semantic tokens, and an S2S generation back-end, which synthesizes clean speech using both semantic tokens and noisy acoustic tokens.

TODO 📝

  • Release Inference pipeline
  • Release pre-trained model
  • Support in colab
  • More to be added

Getting Started 📥

1. Pre-requisites

  1. Pytorch >=1.13 and torchaudio >= 0.13
  2. Install requirements
conda create -n gense python=3.8
pip install -r requirements.txt

2. Get Self-supervised Model:

Download XLSR model and move it to ckpts dir.
or
Download WavLM Large run a variant of XLSR version.

3. Pre-trained Model:

Download pre-trained model from huggingface, all checkpoints should be stored in ckpts dir.

4. Speech Enhancement:

python infer.py run \
  --noisy_path noisy.wav 
  --out_path ./enhanced.wav 
  --config_path configs/gense.yaml

5. SimCodec Copy-syn:

from components.simcodec.model import SimCodec
codec = SimCodec('config.json')
codec.load_ckpt('g_00100000')
codec = codec.eval()
codec = codec.to('cuda')

code = codec(wav)
print(code.shape) #[B, L1, 1]
syn = codec.decode(code)
print(syn.shape) #[B, 1, L2]
torchaudio.save('copy.wav', syn.detach().cpu().squeeze(0), 16000)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages