Ayanami0730/deep_research_bench

DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents

/ 100

Established

Comprises 100 PhD-level research tasks across 22 domains designed by domain experts, evaluated using multi-dimensional metrics including factuality, reasoning quality, and report comprehensiveness. Integrates with Hugging Face for dataset hosting and leaderboard management, with evaluation infrastructure powered by Gemini models for automated scoring across RACE (reasoning/analysis) and FACT (factuality) dimensions. Supports both proprietary and open-source deep research agents, enabling standardized comparison of agentic research capabilities against expert-curated reference standards.

637 stars. Actively maintained with 8 commits in the last 30 days.

No Package No Dependents

Maintenance 20 / 25

Adoption 10 / 25

Maturity 15 / 25

Community 18 / 25

How are scores calculated?

Stars

637

Forks

Language

Python

License

Apache-2.0

Related tools

Hsankesara/DeepResearch

This repository is the collection of research papers in Deep learning, computer vision and NLP.

QizhiPei/Awesome-Biomolecule-Language-Cross-Modeling

Awesome-Biomolecule-Language-Cross-Modeling: a curated list of resources for paper "Leveraging...

thuiar/OKD-Reading-List

Papers for Open Knowledge Discovery

roomylee/nlp-papers-with-arxiv

Statistics and accepted paper list of NLP conferences with arXiv link

iwangjian/Paper-Reading-ConvAI

📖 Paper reading list in conversational AI.

Explore NLP Tools

All categories Trending NLP directory Insights