cactus-compute/cactus
Low-latency AI engine for mobile devices & wearables
Provides OpenAI-compatible APIs for multimodal inference (chat, vision, speech, embeddings, RAG) with zero-copy memory mapping that reduces RAM usage 10x, leveraging a custom PyTorch-based computation graph and ARM SIMD kernels optimized for different chipsets (Apple, Snapdragon, Exynos). Supports automatic cloud fallback for requests exceeding device capabilities and NPU-accelerated prefill for models ranging from 270M to 8B parameters across iOS, Android, macOS, and Linux via C, C++, Python, Swift, Kotlin, Flutter, Rust, and JavaScript SDKs.
4,430 stars. Actively maintained with 54 commits in the last 30 days.
Stars
4,430
Forks
328
Language
C
License
—
Category
Last pushed
Mar 13, 2026
Commits (30d)
54
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/cactus-compute/cactus"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
langbot-app/LangBot
Production-grade platform for building agentic IM bots - 生产级多平台智能机器人开发平台. 提供 Agent、知识库编排、插件系统 /...
open-webui/open-webui
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
sigoden/aichat
All-in-one LLM CLI tool featuring Shell Assistant, Chat-REPL, RAG, AI Tools & Agents, with...
rudrankriyam/Foundation-Models-Framework-Example
Example apps for Foundation Models Framework in iOS 26 and macOS 26
Light-Heart-Labs/DreamServer
One command to a fully local AI stack — LLM inference, chat UI, voice, agents, workflows, RAG,...