marklysze/LlamaIndex-RAG-WSL-CUDA

Examples of RAG using Llamaindex with local LLMs - Gemma, Mixtral 8x7B, Llama 2, Mistral 7B, Orca 2, Phi-2, Neural 7B

/ 100

Emerging

Demonstrates RAG with quantized GGUF models running locally via llama-cpp-python, optimized for Nvidia CUDA acceleration in WSL environments, and includes Word document ingestion with source attribution. Built on LlamaIndex Core with configurable prompt templates for different model architectures, enabling semantic search across documents with retrieval-augmented generation capabilities. Provides production-ready notebooks comparing outputs across multiple model sizes (2B–34B parameters) to assess quality-versus-resource tradeoffs.

132 stars. No commits in the last 6 months.

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 13 / 25

How are scores calculated?

Stars

132

Forks

Language

Jupyter Notebook

License

—

Higher-rated alternatives

run-llama/llama_index

LlamaIndex is the leading document agent and OCR platform

emarco177/documentation-helper

Reference implementation of a RAG-based documentation helper using LangChain, Pinecone, and Tavily..

janus-llm/janus-llm

Leveraging LLMs for modernization through intelligent chunking, iterative prompting and...

JetXu-LLM/llama-github

Llama-github is an open-source Python library that empowers LLM Chatbots, AI Agents, and...

Vasallo94/ObsidianRAG

RAG system to query your Obsidian notes using LangGraph and local LLMs (Ollama)

Explore RAG Tools

All categories Trending RAG directory Insights