arcee-ai/DALM
Domain Adapted Language Modeling Toolkit - E2E RAG
Implements fully differentiable end-to-end RAG training that jointly optimizes retriever and decoder-only generator models (Llama, Falcon, GPT) using in-batch negatives for efficiency. Supports both retriever-only contrastive learning and joint RAG-e2e fine-tuning pipelines with synthetic data generation via the `dalm` CLI, compatible with any Hugging Face embedding or language model. Includes evaluation harness for retriever recall/hit-rate metrics and pre-built domain-adapted examples (patents, PubMed, SEC filings).
335 stars. No commits in the last 6 months.
Stars
335
Forks
46
Language
Python
License
Apache-2.0
Category
Last pushed
Nov 08, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/arcee-ai/DALM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
LearningCircuit/local-deep-research
Local Deep Research achieves ~95% on SimpleQA benchmark (tested with GPT-4.1-mini). Supports...
NVIDIA-AI-Blueprints/rag
This NVIDIA RAG blueprint serves as a reference solution for a foundational Retrieval Augmented...
Denis2054/RAG-Driven-Generative-AI
This repository provides programs to build Retrieval Augmented Generation (RAG) code for...
0verL1nk/PaperSage
📚 AI-powered research reading workbench. Project-based paper Q&A with Hybrid RAG, multi-agent...
RapidFireAI/rapidfireai
RapidFire AI: Rapid AI Customization from RAG to Fine-Tuning