Trustworthy-ML-Lab

Neuron_Eval Public
[ICML 25] A unified mathematical framework to evaluate neuron explanations of deep learning models with sanity tests

Trustworthy-ML-Lab/Neuron_Eval’s past year of commit activity

1 0 0 0 Updated May 27, 2025
CB-LLMs Public
[ICLR 25] A novel framework for building intrinsically interpretable LLMs with human-understandable concepts to ensure safety, reliability, transparency, and trustworthiness.

Trustworthy-ML-Lab/CB-LLMs’s past year of commit activity

Python 13 3 0 0 Updated May 26, 2025
posthoc-generative-cbm Public
[CVPR 2025] Concept Bottleneck Autoencoder (CB-AE) -- efficiently transform any pretrained (black-box) image generative model into an interpretable generative concept bottleneck model (CBM) with minimal concept supervision, while preserving image quality

Trustworthy-ML-Lab/posthoc-generative-cbm’s past year of commit activity

Jupyter Notebook 7 1 1 0 Updated May 13, 2025
ThinkEdit Public
An effective weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study uncovering how reasoning length is encoded in the model’s representation space.

Trustworthy-ML-Lab/ThinkEdit’s past year of commit activity

Python 12 2 0 0 Updated May 6, 2025
Linear-Explanations Public
[ICML 24] A novel automated neuron explanation framework that can accurately describe poly-semantic concepts in deep neural networks

Trustworthy-ML-Lab/Linear-Explanations’s past year of commit activity

Jupyter Notebook 12 0 0 0 Updated May 2, 2025
effective_skill_unlearning Public
[NAACL 25] Two novel, light-weight, and training-free skill unlearning methods for LLMs

Trustworthy-ML-Lab/effective_skill_unlearning’s past year of commit activity

Python 3 0 0 0 Updated Mar 27, 2025
RAT_MisD Public
Boosting misclassification detection ability by radius-aware training (RAT)

Trustworthy-ML-Lab/RAT_MisD’s past year of commit activity

Python 0 0 0 0 Updated Mar 21, 2025
Describe-and-Dissect Public
[TMLR 25] An automated method for explaining complex neuron behaviors in deep vision models using large language models

Trustworthy-ML-Lab/Describe-and-Dissect’s past year of commit activity

Jupyter Notebook 10 2 1 0 Updated Feb 20, 2025
Concept-Bottleneck-LLM Public

Trustworthy-ML-Lab/Concept-Bottleneck-LLM’s past year of commit activity

Python 5 0 0 0 Updated Feb 1, 2025
provable-efficient-dataset-distill-KRR Public

Trustworthy-ML-Lab/provable-efficient-dataset-distill-KRR’s past year of commit activity

Python 1 Apache-2.0 0 0 0 Updated Dec 10, 2024

View all repositories

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trustworthy-ML-Lab

Popular repositories Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!