ChatGLM2-6B and VisualGLM-6B
These are ecosystem siblings: ChatGLM2-6B is a text-only LLM backbone while VisualGLM-6B extends the same architecture to multimodal inputs, allowing users to choose the variant that matches their input modality requirements.
About ChatGLM2-6B
zai-org/ChatGLM2-6B
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
Based on the README, here's a technical summary: Built on the GLM base architecture with Multi-Query Attention for efficient inference, ChatGLM2-6B expands context length to 32K tokens (8K in conversation) using FlashAttention, achieving 42% faster inference and reducing INT4 quantization memory from 6GB to support 8K token conversations. Trained on 1.4T bilingual tokens with hybrid objectives and human preference alignment, it integrates seamlessly with HuggingFace's transformers library and supports INT4/INT8 quantization for deployment on resource-constrained hardware.
About VisualGLM-6B
zai-org/VisualGLM-6B
Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
Built on ChatGLM-6B with BLIP2-Qformer bridging visual and language representations, it aligns 30M Chinese and 300M English image-text pairs during pretraining. Supports efficient parameter tuning through LoRA, QLoRA, and P-tuning via the SwissArmyTransformer framework, enabling deployment on consumer GPUs with INT4 quantization requiring as little as 6.3GB VRAM.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work