GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
_{The official implementation of GenSE (ICLR 2025)}

We propose a comprehensive framework tailored for language model-based speech enhancement, called GenSE. Speech enhancement is regarded as a conditional language modeling task rather than a continuous signal regression problem defined in existing works. This is achieved by tokenizing speech signals into semantic tokens using a pre-trained self-supervised model and into acoustic tokens using a custom-designed single-quantizer neural codec model.

GenSE employs a hierarchical modeling framework with a two-stage process: a N2S transformation front-end, which converts noisy speech into clean semantic tokens, and an S2S generation back-end, which synthesizes clean speech using both semantic tokens and noisy acoustic tokens.

TODO 📝

Release Inference pipeline
Release pre-trained model
Support in colab
More to be added

Getting Started 📥

1. Pre-requisites

Pytorch >=1.13 and torchaudio >= 0.13
Install requirements

conda create -n gense python=3.8
pip install -r requirements.txt

2. Get Self-supervised Model:

Download XLSR model and move it to ckpts dir.
or
Download WavLM Large run a variant of XLSR version.

3. Pre-trained Model:

Download pre-trained model from huggingface, all checkpoints should be stored in ckpts dir.

4. Speech Enhancement:

python infer.py run \
  --noisy_path noisy.wav 
  --out_path ./enhanced.wav 
  --config_path configs/gense.yaml

5. SimCodec Copy-syn:

from components.simcodec.model import SimCodec
codec = SimCodec('config.json')
codec.load_ckpt('g_00100000')
codec = codec.eval()
codec = codec.to('cuda')

code = codec(wav)
print(code.shape) #[B, L1, 1]
syn = codec.decode(code)
print(syn.shape) #[B, 1, L2]
torchaudio.save('copy.wav', syn.detach().cpu().squeeze(0), 16000)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
_{The official implementation of GenSE (ICLR 2025)}

TODO 📝

Getting Started 📥

1. Pre-requisites

2. Get Self-supervised Model:

3. Pre-trained Model:

4. Speech Enhancement:

5. SimCodec Copy-syn:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
components		components
configs		configs
fig		fig
models		models
.gitignore		.gitignore
README.md		README.md
infer.py		infer.py
noisy.wav		noisy.wav
requirements.txt		requirements.txt

yaoxunji/gen-se

Folders and files

Latest commit

History

Repository files navigation

GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling The official implementation of GenSE (ICLR 2025)

TODO 📝

Getting Started 📥

1. Pre-requisites

2. Get Self-supervised Model:

3. Pre-trained Model:

4. Speech Enhancement:

5. SimCodec Copy-syn:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
_{The official implementation of GenSE (ICLR 2025)}

Packages