p-e-w/heretic
Fully automatic censorship removal for language models
Combines directional ablation with Optuna's TPE-based hyperparameter optimization to automatically identify abliteration parameters that minimize refusals while preserving model capabilities via KL divergence constraints. Supports dense and MoE architectures across PyTorch models, with optional bitsandbytes quantization for reduced VRAM requirements. Includes research tooling for interpretability analysis, such as PaCMAP-based residual vector visualization across transformer layers.
12,369 stars. Actively maintained with 17 commits in the last 30 days.
Stars
12,369
Forks
1,273
Language
Python
License
AGPL-3.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
17
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/p-e-w/heretic"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
ModelTC/LightCompress
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs,...
YerbaPage/LongCodeZip
LongCodeZip: Compress Long Context for Code Language Models [ASE2025]
Orion-zhen/abliteration
Make abliterated models with transformers, easy and fast
FMInference/FlexLLMGen
Running large language models on a single GPU for throughput-oriented scenarios.
zyushun/Adam-mini
Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793