chatglm.cpp and ChatGLM2-6B

The C++ implementation of ChatGLM (A) serves as an ecosystem sibling to the core ChatGLM2-6B model (B), providing an alternative runtime environment for deployment and inference, potentially with performance advantages or specific hardware compatibility.

chatglm.cpp

Emerging

ChatGLM2-6B

Emerging

Maintenance 0/25

Adoption 10/25

Maturity 16/25

Community 21/25

Maintenance 0/25

Adoption 10/25

Maturity 16/25

Community 21/25

Stars: 2,960

Forks: 329

Downloads: —

Commits (30d): 0

Language: C++

License: MIT

Stars: 15,645

Forks: 1,820

Downloads: —

Commits (30d): 0

Language: Python

License: —

Stale 6m No Package No Dependents

About chatglm.cpp

li-plus/chatglm.cpp

C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)

About ChatGLM2-6B

zai-org/ChatGLM2-6B

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型

Based on the README, here's a technical summary: Built on the GLM base architecture with Multi-Query Attention for efficient inference, ChatGLM2-6B expands context length to 32K tokens (8K in conversation) using FlashAttention, achieving 42% faster inference and reducing INT4 quantization memory from 6GB to support 8K token conversations. Trained on 1.4T bilingual tokens with hybrid objectives and human preference alignment, it integrates seamlessly with HuggingFace's transformers library and supports INT4/INT8 quantization for deployment on resource-constrained hardware.

Related comparisons

chatglm.cpp and ChatLLM chatglm.cpp and chatchat chatglm.cpp and VisualGLM-6B

Scores updated daily from GitHub, PyPI, and npm data. How scores work