Softlandia-Ltd/vision-is-all-you-need
Serverless Modal + FastAPI + React + ColPali + Qdrant + GPT4o Vision RAG (V-RAG) Demo
Implements document retrieval without text chunking by converting PDF pages to images and embedding them directly with ColPali, a vision language model optimized for document understanding. The system retrieves visually similar pages from Qdrant, then grounds GPT-4o's responses with the actual document images rather than extracted text. Deploys as serverless functions on Modal with a React frontend, eliminating preprocessing overhead while preserving document layout and visual context.
405 stars. No commits in the last 6 months.
Stars
405
Forks
52
Language
TypeScript
License
MIT
Category
Last pushed
Jun 26, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/Softlandia-Ltd/vision-is-all-you-need"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Azure-Samples/serverless-chat-langchainjs
Build your own serverless AI Chat with Retrieval-Augmented-Generation using LangChain.js,...
GitHamza0206/simba
OpenSource Production ready Customer service with built in Evals and monitoring
Cocolalilal/LastChat
A Fork of Rikkahub with an overhauled UI and feature additions
crawlchat/crawlchat
Turn your documentation into an AI assistant that answers questions instantly
Dcup-dev/dcup
Dcup - Advanced RAG for Personal Knowledge ☕