picollm and SqueezeLLM

X-bit quantization and dense-and-sparse quantization represent complementary approaches to LLM compression—the former uses uniform bit-width reduction across parameters while the latter selectively applies different quantization strategies to different weight distributions—making them alternative techniques rather than tools designed to work together.

picollm
57
Established
SqueezeLLM
41
Emerging
Maintenance 10/25
Adoption 10/25
Maturity 25/25
Community 12/25
Maintenance 0/25
Adoption 10/25
Maturity 16/25
Community 15/25
Stars: 305
Forks: 17
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
Stars: 713
Forks: 49
Downloads:
Commits (30d): 0
Language: Python
License: MIT
No Dependents
Stale 6m No Package No Dependents

About picollm

Picovoice/picollm

On-device LLM Inference Powered by X-Bit Quantization

This tool helps developers integrate highly accurate, compressed large language models (LLMs) directly into their applications, allowing them to run AI-powered features on user devices or local servers. It takes open-weight LLMs and delivers efficient, private AI inference, enabling features like local voice assistants or smart text generation. This is ideal for software engineers building applications that require offline AI capabilities.

mobile-app-development edge-ai offline-ai embedded-systems privacy-focused-applications

About SqueezeLLM

SqueezeAILab/SqueezeLLM

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

This project helps machine learning engineers and MLOps specialists deploy large language models (LLMs) more efficiently. It takes existing LLM weights (like LLaMA, Vicuna, or Mistral) and processes them to produce smaller, optimized model weights. The result is an LLM that requires significantly less memory to run, while often maintaining or even improving its accuracy and speed.

LLM deployment model optimization GPU efficiency AI infrastructure machine learning operations

Scores updated daily from GitHub, PyPI, and npm data. How scores work