robitec97/gemma3.c

Gemma 3 pure inference in C

/ 100

Experimental

Implements Gemma 3 4B inference with native SentencePiece tokenization (262K vocab) and memory-mapped BF16 SafeTensors weights, supporting hybrid attention with grouped query attention and 128K context windows. Offers Metal GPU acceleration for Apple Silicon, optional OpenBLAS BLAS operations, and multi-threaded CPU inference, with both CLI and C library interfaces. Achieves ~3GB runtime memory via KV cache scaling and includes interactive chat mode with multi-turn conversation history.

112 stars.

No License No Package No Dependents

Maintenance 10 / 25

Adoption 9 / 25

Maturity 1 / 25

Community 7 / 25

How are scores calculated?

Stars

112

Forks

Language

License

—

Higher-rated alternatives

GURPREETKAURJETHRA/PaliGemma-Inference-and-Fine-Tuning

PaliGemma Inference and Fine Tuning

GURPREETKAURJETHRA/PaliGemma-FineTuning

PaliGemma FineTuning

LikithMeruvu/Gemma2B_Finetuning_Medium

This Repo contains How to Finetune Google's New Gemma LLm model using your custom instuction...

natnew/Gemma-Open-Models

Gemma is a family of lightweight, state-of-the-art open models built from the same research and...

stabgan/biogemma

BioGemma — Google Gemma 3 1B fine-tuned on medical/biomedical corpus for clinical NLP tasks

Explore LLM Tools

All categories Trending LLM Tool directory Insights