Tencent/AngelSlim

Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.

/ 100

Verified

Supports multiple compression strategies—quantization algorithms (FP8, INT4, INT8, exotic formats like NVFP4 and 1.25-bit Sherry), speculative decoding frameworks (Eagle3, SpecExit), and pruning—across LLMs, vision-language models, and diffusion models. Built on a unified post-training quantization (PTQ) pipeline optimized for single-GPU operation on models up to 235B parameters. Integrates with Hugging Face and ModelScope ecosystems, with inference backends including vLLM and Torch.

536 stars and 5,117 monthly downloads. Actively maintained with 21 commits in the last 30 days. Available on PyPI.

Maintenance 23 / 25

Adoption 19 / 25

Maturity 24 / 25

Community 19 / 25

How are scores calculated?

Stars

536

Forks

Language

Python

License

—

Related tools

kyo-takano/chinchilla

A toolkit for scaling law research ⚖

nebuly-ai/optimate

A collection of libraries to optimise AI model performances

liyucheng09/Selective_Context

Compress your input to ChatGPT or other LLMs, to let them process 2x more content and save 40%...

antgroup/glake

GLake: optimizing GPU memory management and IO transmission.

microsoft/only_train_once

OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured Pruning, Erasing Operators,...

Explore LLM Tools

All categories Trending LLM Tool directory Insights