cactus-compute/cactus

Low-latency AI engine for mobile devices & wearables

/ 100

Established

Provides OpenAI-compatible APIs for multimodal inference (chat, vision, speech, embeddings, RAG) with zero-copy memory mapping that reduces RAM usage 10x, leveraging a custom PyTorch-based computation graph and ARM SIMD kernels optimized for different chipsets (Apple, Snapdragon, Exynos). Supports automatic cloud fallback for requests exceeding device capabilities and NPU-accelerated prefill for models ranging from 270M to 8B parameters across iOS, Android, macOS, and Linux via C, C++, Python, Swift, Kotlin, Flutter, Rust, and JavaScript SDKs.

4,430 stars. Actively maintained with 54 commits in the last 30 days.

No Package No Dependents

Maintenance 25 / 25

Adoption 10 / 25

Maturity 15 / 25

Community 19 / 25

How are scores calculated?

Stars

4,430

Forks

328

Language

License

—

Related tools

langbot-app/LangBot

Production-grade platform for building agentic IM bots - 生产级多平台智能机器人开发平台. 提供 Agent、知识库编排、插件系统 /...

open-webui/open-webui

User-friendly AI Interface (Supports Ollama, OpenAI API, ...)

sigoden/aichat

All-in-one LLM CLI tool featuring Shell Assistant, Chat-REPL, RAG, AI Tools & Agents, with...

rudrankriyam/Foundation-Models-Framework-Example

Example apps for Foundation Models Framework in iOS 26 and macOS 26

Light-Heart-Labs/DreamServer

One command to a fully local AI stack — LLM inference, chat UI, voice, agents, workflows, RAG,...

Explore RAG Tools

All categories Trending RAG directory Insights