arcee-ai/DALM

Domain Adapted Language Modeling Toolkit - E2E RAG

/ 100

Emerging

Implements fully differentiable end-to-end RAG training that jointly optimizes retriever and decoder-only generator models (Llama, Falcon, GPT) using in-batch negatives for efficiency. Supports both retriever-only contrastive learning and joint RAG-e2e fine-tuning pipelines with synthetic data generation via the `dalm` CLI, compatible with any Hugging Face embedding or language model. Includes evaluation harness for retriever recall/hit-rate metrics and pre-built domain-adapted examples (patents, PubMed, SEC filings).

335 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

335

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

LearningCircuit/local-deep-research

Local Deep Research achieves ~95% on SimpleQA benchmark (tested with GPT-4.1-mini). Supports...

NVIDIA-AI-Blueprints/rag

This NVIDIA RAG blueprint serves as a reference solution for a foundational Retrieval Augmented...

Denis2054/RAG-Driven-Generative-AI

This repository provides programs to build Retrieval Augmented Generation (RAG) code for...

0verL1nk/PaperSage

📚 AI-powered research reading workbench. Project-based paper Q&A with Hybrid RAG, multi-agent...

RapidFireAI/rapidfireai

RapidFire AI: Rapid AI Customization from RAG to Fine-Tuning

Explore RAG Tools

All categories Trending RAG directory Insights