UKPLab/PeerQA

Code and Data for PeerQA: A Scientific Question Answering Dataset from Peer Reviews, NAACL 2025

/ 100

Emerging

Constructed from peer review questions answered by paper authors, the dataset contains 579 QA pairs spanning document-level scientific texts averaging 12k tokens—designed to evaluate evidence retrieval, answerability classification, and long-context answer generation. The project integrates GROBID 0.8 for PDF-to-text extraction and establishes baselines using BM25 (Pyserini), dense retrievers, and cross-encoder rerankers, with decontextualization techniques shown to improve retrieval across architectures. Available on HuggingFace Datasets with evaluation scripts supporting transformer-based models including DeepSeek-R1 and Qwen variants.

No Package No Dependents

Maintenance 10 / 25

Adoption 5 / 25

Maturity 9 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

GrapeCity-AI/gc-qa-rag

A RAG (Retrieval-Augmented Generation) solution Based on Advanced Pre-generated QA Pairs. 基于高级...

Vbj1808/Dokis

Lightweight RAG provenance middleware. Verifies every claim in an LLM response is grounded in a...

Arfazrll/RAG-DocsInsight-Engine

Retrieval Augmented Generation (RAG) engine for intelligent document analysis. integrating LLM,...

pcastiglione99/RAGify-Search

RAGify is designed to enhance search capabilities using Retrieval-Augmented Generation (RAG). By...

Adii2202/RAG-AI-Voice-assistant-

Performing a RAG (Retrieval Augmented Generation) assessment using voice-to-voice query...

Explore RAG Tools

All categories Trending RAG directory Insights