Skip to content

Code for the ETH MSc Thesis: Sparse Autoencoders vs. Activation Difference for Language Model Steering

License

Notifications You must be signed in to change notification settings

jiaqingxie/steer-sae

Repository files navigation

SAE_Math

Training sparse autoencoders on some SOTA math LLMs.

News

  • 2024.10.3 Online
  • 2025.1.10 Finish Analysis I
python -m venv sae
source sae/bin/activate
pip install vllm==0.7.2
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121 --force-reinstall
pip install transformers
pip install transformer_lens sae_lens
pip install seaborn word2number
pip install nltk langdetect

Analysis I

Analysis II

Train and test on: ifeval_wo_instructions.jsonl, ifeval_single_keyword_include.jsonl, and ifeval_single_keyword_exclude.jsonl

Length constraints: Answer using {at most} {K} sentences.

Format constraints: On lowercase / JSON format / Highlight sentences

Dataset: IFEval

About

Code for the ETH MSc Thesis: Sparse Autoencoders vs. Activation Difference for Language Model Steering

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published