p-e-w/heretic

Fully automatic censorship removal for language models

/ 100

Established

Combines directional ablation with Optuna's TPE-based hyperparameter optimization to automatically identify abliteration parameters that minimize refusals while preserving model capabilities via KL divergence constraints. Supports dense and MoE architectures across PyTorch models, with optional bitsandbytes quantization for reduced VRAM requirements. Includes research tooling for interpretability analysis, such as PaCMAP-based residual vector visualization across transformer layers.

12,369 stars. Actively maintained with 17 commits in the last 30 days.

No Package No Dependents

Maintenance 20 / 25

Adoption 10 / 25

Maturity 15 / 25

Community 20 / 25

How are scores calculated?

Stars

12,369

Forks

1,273

Language

Python

License

AGPL-3.0

Related models

ModelTC/LightCompress

[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs,...

YerbaPage/LongCodeZip

LongCodeZip: Compress Long Context for Code Language Models [ASE2025]

Orion-zhen/abliteration

Make abliterated models with transformers, easy and fast

FMInference/FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.

zyushun/Adam-mini

Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793

Explore Transformer Models

All categories Trending Transformer directory Insights