haotian-liu/LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

47
/ 100
Emerging

Combines a vision encoder (CLIP) with a lightweight projection layer to align image features with large language models, enabling end-to-end instruction tuning on image-text pairs. Supports efficient fine-tuning via LoRA, quantization (4/8-bit), and variable resolution inputs up to 4x higher pixel density in newer versions. Integrates with Hugging Face, llama.cpp, and AutoGen, with pre-trained checkpoints spanning multiple base models (LLaMA, Llama-2, Qwen, Llama-3).

24,554 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 21 / 25

How are scores calculated?

Stars

24,554

Forks

2,745

Language

Python

License

Apache-2.0

Last pushed

Aug 12, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/haotian-liu/LLaVA"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.