marklysze/LlamaIndex-RAG-WSL-CUDA
Examples of RAG using Llamaindex with local LLMs - Gemma, Mixtral 8x7B, Llama 2, Mistral 7B, Orca 2, Phi-2, Neural 7B
Demonstrates RAG with quantized GGUF models running locally via llama-cpp-python, optimized for Nvidia CUDA acceleration in WSL environments, and includes Word document ingestion with source attribution. Built on LlamaIndex Core with configurable prompt templates for different model architectures, enabling semantic search across documents with retrieval-augmented generation capabilities. Provides production-ready notebooks comparing outputs across multiple model sizes (2B–34B parameters) to assess quality-versus-resource tradeoffs.
132 stars. No commits in the last 6 months.
Stars
132
Forks
14
Language
Jupyter Notebook
License
—
Category
Last pushed
Feb 25, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/marklysze/LlamaIndex-RAG-WSL-CUDA"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
run-llama/llama_index
LlamaIndex is the leading document agent and OCR platform
emarco177/documentation-helper
Reference implementation of a RAG-based documentation helper using LangChain, Pinecone, and Tavily..
janus-llm/janus-llm
Leveraging LLMs for modernization through intelligent chunking, iterative prompting and...
JetXu-LLM/llama-github
Llama-github is an open-source Python library that empowers LLM Chatbots, AI Agents, and...
Vasallo94/ObsidianRAG
RAG system to query your Obsidian notes using LangGraph and local LLMs (Ollama)