robitec97/gemma3.c
Gemma 3 pure inference in C
Implements Gemma 3 4B inference with native SentencePiece tokenization (262K vocab) and memory-mapped BF16 SafeTensors weights, supporting hybrid attention with grouped query attention and 128K context windows. Offers Metal GPU acceleration for Apple Silicon, optional OpenBLAS BLAS operations, and multi-threaded CPU inference, with both CLI and C library interfaces. Achieves ~3GB runtime memory via KV cache scaling and includes interactive chat mode with multi-turn conversation history.
112 stars.
Stars
112
Forks
5
Language
C
License
—
Category
Last pushed
Feb 04, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/robitec97/gemma3.c"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
GURPREETKAURJETHRA/PaliGemma-Inference-and-Fine-Tuning
PaliGemma Inference and Fine Tuning
GURPREETKAURJETHRA/PaliGemma-FineTuning
PaliGemma FineTuning
LikithMeruvu/Gemma2B_Finetuning_Medium
This Repo contains How to Finetune Google's New Gemma LLm model using your custom instuction...
natnew/Gemma-Open-Models
Gemma is a family of lightweight, state-of-the-art open models built from the same research and...
stabgan/biogemma
BioGemma — Google Gemma 3 1B fine-tuned on medical/biomedical corpus for clinical NLP tasks