umbertogriffo/rag-chatbot
RAG (Retrieval-augmented generation) ChatBot that provides answers based on contextual information extracted from a collection of Markdown files.
Combines llama.cpp with Chroma embeddings to enable conversation-aware RAG using locally-quantized models, with recursive Markdown chunking and two context-overflow strategies (sequential refinement and hierarchical summarization). Supports CUDA and Metal GPU acceleration across Linux and macOS, using sentence-transformers for embeddings and including chat history context for multi-turn interactions.
387 stars.
Stars
387
Forks
97
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 07, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/umbertogriffo/rag-chatbot"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
AleksNeStu/ai-real-estate-assistant
Advanced AI Real Estate Assistant using RAG, LLMs, and Python. Features market analysis,...
Azure-Samples/aisearch-openai-rag-audio
A simple example implementation of the VoiceRAG pattern to power interactive voice generative AI...
gurveervirk/ToK
Simple, High Quality, Open Source RAG solution for chatting with your documents
amajji/llm-rag-chatbot-with-langchain
Development and deployment on AWS of a question-answer LLM model using Llama2 with 7B parameters...
jayeshmahapatra/rag-chatbot
Retreival Augmented Generation (RAG) chatbot for my blog