marklysze/LlamaIndex-RAG-WSL-CUDA

Examples of RAG using Llamaindex with local LLMs - Gemma, Mixtral 8x7B, Llama 2, Mistral 7B, Orca 2, Phi-2, Neural 7B

31
/ 100
Emerging

Demonstrates RAG with quantized GGUF models running locally via llama-cpp-python, optimized for Nvidia CUDA acceleration in WSL environments, and includes Word document ingestion with source attribution. Built on LlamaIndex Core with configurable prompt templates for different model architectures, enabling semantic search across documents with retrieval-augmented generation capabilities. Provides production-ready notebooks comparing outputs across multiple model sizes (2B–34B parameters) to assess quality-versus-resource tradeoffs.

132 stars. No commits in the last 6 months.

No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 13 / 25

How are scores calculated?

Stars

132

Forks

14

Language

Jupyter Notebook

License

Last pushed

Feb 25, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/marklysze/LlamaIndex-RAG-WSL-CUDA"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.