cactus-compute/cactus

Low-latency AI engine for mobile devices & wearables

69
/ 100
Established

Provides OpenAI-compatible APIs for multimodal inference (chat, vision, speech, embeddings, RAG) with zero-copy memory mapping that reduces RAM usage 10x, leveraging a custom PyTorch-based computation graph and ARM SIMD kernels optimized for different chipsets (Apple, Snapdragon, Exynos). Supports automatic cloud fallback for requests exceeding device capabilities and NPU-accelerated prefill for models ranging from 270M to 8B parameters across iOS, Android, macOS, and Linux via C, C++, Python, Swift, Kotlin, Flutter, Rust, and JavaScript SDKs.

4,430 stars. Actively maintained with 54 commits in the last 30 days.

No Package No Dependents
Maintenance 25 / 25
Adoption 10 / 25
Maturity 15 / 25
Community 19 / 25

How are scores calculated?

Stars

4,430

Forks

328

Language

C

License

Last pushed

Mar 13, 2026

Commits (30d)

54

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/cactus-compute/cactus"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.