Structured Data Inference NLP Tools

Datasets and benchmarks for NLI, table understanding, text-to-SQL, and instruction-following tasks involving structured or semi-structured data. Does NOT include general sentiment analysis, classification tasks without structured reasoning components, or commonsense knowledge resources without explicit inference evaluation.

There are 74 structured data inference tools tracked. The highest-rated is ymcui/cmrc2018 at 42/100 with 451 stars.

Get all 74 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=structured-data-inference&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 ymcui/cmrc2018

A Span-Extraction Dataset for Chinese Machine Reading Comprehension (CMRC 2018)

42
Emerging
2 princeton-nlp/DensePhrases

[ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021:...

38
Emerging
3 thunlp/MultiRD

Code and data of the AAAI-20 paper "Multi-channel Reverse Dictionary Model"

38
Emerging
4 IndexFziQ/KMRC-Papers

A list of recent papers regarding knowledge-based machine reading comprehension.

35
Emerging
5 danqi/rc-cnn-dailymail

CNN/Daily Mail Reading Comprehension Task

33
Emerging
6 declare-lab/CIDER

This repository contains the dataset and the pytorch implementations of the...

32
Emerging
7 maastrichtlawtech/gdsr

🕸️ A graph-augmented dense statute retriever. (EACL 2023)

32
Emerging
8 intfloat/SimKGC

ACL 2022, SimKGC: Simple Contrastive Knowledge Graph Completion with...

32
Emerging
9 zjunlp/MKG_Analogy

[ICLR 2023] Multimodal Analogical Reasoning over Knowledge Graphs

32
Emerging
10 ShiZhengyan/StepGame

[AAAI 2022] Dataset and pytorch codes for the paper titled "StepGame: A New...

32
Emerging
11 shmsw25/AmbigQA

An original implementation of EMNLP 2020, "AmbigQA: Answering Ambiguous...

31
Emerging
12 GeekDream-x/IDOL

Repo for paper "IDOL: Indicator-oriented Logic Pre-training for Logical...

30
Emerging
13 IndexFziQ/MSMARCO-MRC-Analysis

Analysis on the MS-MARCO leaderboard regarding the machine reading...

30
Emerging
14 utahnlp/knowledge_infotabs

Repository containing code for the NAACL 2021 paper (Incorporating External...

30
Emerging
15 yuweihao/reclor

Code for "ReClor: A Reading Comprehension Dataset Requiring Logical...

29
Experimental
16 XingLuxi/KMRC-Research-Archive

🗂 Research about Knowledge-based Machine Reading Comprehension

28
Experimental
17 phanxuanphucnd/Active-learning-in-NLP

Active learning in NLP

28
Experimental
18 FeiWang96/GTR

[SIGIR 2021] Retrieving Complex Tables with Multi-Granular Graph...

27
Experimental
19 amazon-science/pizza-semantic-parsing-dataset

The PIZZA dataset continues the exploration of task-oriented parsing by...

27
Experimental
20 anshitag/memit_csk

Source repository for Editing Common Sense in Transformers (EMNLP 2023)

27
Experimental
21 webis-de/acl22-revisiting-uncertainty-based-query-strategies-for-active-learning-with-transformers

Revisiting Uncertainty-based Query Strategies for Active Learning with Transformers

27
Experimental
22 marceljahnke/negative-cache

PyTorch Implementation of the Paper "Efficient Training of Retrieval Models...

26
Experimental
23 amazon-science/wqa-multi-sentence-inference

This repository contains code used for our Multi Sentence Inference NAACL'22 paper.

25
Experimental
24 ymcui/expmrc

ExpMRC: Explainability Evaluation for Machine Reading Comprehension

25
Experimental
25 sherlcok314159/ChineseMRC-Data

收集了目前为止中文领域的MRC抽取式数据集

25
Experimental
26 thunlp/CokeBERT

CokeBERT: Contextual Knowledge Selection and Embedding towards Enhanced...

25
Experimental
27 acidAnn/semeval2022_task7_starter_kit

:bulb: Starter kit for SemEval 2022 Task 7: Identifying Plausible...

25
Experimental
28 USSiamaboat/polytuplet-loss

A Reverse Approach to Training Reading Comprehension and Logical Reasoning Models

24
Experimental
29 humanlab/rare-class-AL

AL for rare class strategies compared in the paper "Transfer and Active...

24
Experimental
30 ict-bigdatalab/CorpusBrain

CIKM 2022: CorpusBrain: Pre-train a Generative Retrieval Model for...

24
Experimental
31 ai-systems/tg2022task_premise_retrieval

TextGraphs Shared Task on Natural Language Premise Selection

24
Experimental
32 Jordy-VL/uncertainty-bench

Code repository for **Benchmarking Scalable Predictive Uncertainty in Text...

24
Experimental
33 semeval-2026-kclarity/clarity

Code release for KCLarity at SemEval-2026 Task 6: Encoder and Zero-Shot...

24
Experimental
34 Dibyakanti/AutoTNLI-code

This repository contains the official code for the paper : Realistic Data...

23
Experimental
35 testzer0/AmbiQT

Code and Assets for "Benchmarking and Improving Text-to-SQL Generation Under...

22
Experimental
36 psunlpgroup/XSemPLR

Data and code for ACL 2023 paper XSemPLR: Cross-Lingual Semantic Parsing in...

22
Experimental
37 ZeinabAghahadi/Syllogistic-Commonsense-Reasoning

Deductive Commonsense Reasoning

21
Experimental
38 pietrolesci/anchoral

This is the official PyTorch implementation for our NAACL 2024 paper:...

21
Experimental
39 krystalan/Multi-hopRC

:notebook_with_decorative_cover: notes for Multi-hop Reading Comprehension...

21
Experimental
40 minnesotanlp/infoVerse

Jaehyung Kim et al's ACL 2023 paper on "infoVerse: A Universal Framework for...

20
Experimental
41 Pzoom522/xANLG

Data and code for "Understanding Linearity of Cross-Lingual Word Embedding...

20
Experimental
42 cognitiveailab/tg2021task

Participant Kit for the TextGraphs-15 Shared Task on Explanation Regeneration

20
Experimental
43 INK-USC/RiddleSense

RiddleSense: Reasoning about Riddle Questions Featuring Linguistic...

20
Experimental
44 phosseini/GisPy

GisPy: A Tool for Measuring Gist Inference Score in Text...

20
Experimental
45 THU-KEG/COPEN

The official code and dataset for EMNLP 2022 paper "COPEN: Probing...

19
Experimental
46 MultimodalGeo/GeoText-1652

An offical repo for ECCV 2024 Towards Natural Language-Guided Drones:...

19
Experimental
47 ZhengZixiang/MRCPapers

Worth-reading paper list and other awesome resources on Machine Reading...

18
Experimental
48 mariomeissner/AmbiNLI

This is the code for the paper "Embracing Ambiguity: Shifting the Training...

17
Experimental
49 yul091/UnBED

Codebase for the ACL 2023 paper: "Uncertainty-Aware Bootstrap Learning for...

17
Experimental
50 MSR-LIT/Splash

Release of SPLASH: Dataset for semantic parse correction with natural...

17
Experimental
51 rycolab/evidence-probing

Code and data for the ACL 2022 paper "Probing as Quantifying Inductive Bias".

16
Experimental
52 royxlead/self-diagnosing-neural-models-python

Self-Diagnosing Neural Networks: models that quantify their own uncertainty...

16
Experimental
53 Advancing-Machine-Human-Reasoning-Lab/transformer-psychometrics

Code to reproduce experiments in our *SEM 2021 Paper

15
Experimental
54 maastrichtlawtech/fusion

🔗 Hybrid retrieval in the legal domain

14
Experimental
55 salesforce/FewXC

Official code and data release for Efficiently Aligned Cross-Lingual...

14
Experimental
56 Raising-hrx/MetGen

An implementation for MetGen: A Module-Based Entailment Tree Generation...

14
Experimental
57 naver/ms-marco-shift

A Fine-Grained Analysis of Distribution Shifts in MSMARCO (MS-Shift)....

13
Experimental
58 LaVi-Lab/C2LEVA

[Findings of ACL 2025] "C2LEVA: Toward Comprehensive and Contamination-Free...

13
Experimental
59 Nativeatom/FRoG

Fuzzy reasoning of Generalized Quantifiers (EMNLP 2024)

13
Experimental
60 megagonlabs/ambignlg

:dog: Data for AmbigNLG: Addressing Task Ambiguity in Instruction for NLG...

13
Experimental
61 fajri91/discourse_probing

Discourse Probing of Pretrained Language Models. In Proceedings of NAACL 2021.

13
Experimental
62 nlp-waseda/dcsg-ja

Dialogue Commonsense Graph in Japanese

13
Experimental
63 megagonlabs/xatu

🕊️ Code and Data for XATU: A Fine-grained Instruction-based Benchmark for...

13
Experimental
64 collapseindex/ci-curation

CI-Guided Data Curation: Using prediction instability to detect label noise....

12
Experimental
65 gianluigilopardo/anchors_text_theory

Code for the paper "A Sea of Words: An In-Depth Analysis of Anchors for Text...

12
Experimental
66 amazon-science/resource-constrained-naturalized-semantic-parsing

This repository is made public for reproducibility of our recent work on...

12
Experimental
67 zhengyima/Anchors

Source code of CIKM2021 Paper 'Pre-training for Ad-hoc Retrieval: Hyperlink...

12
Experimental
68 XInfoTabS/dataset

The Official dataset for "XINFOTABS: Evaluating Multilingual Tabular Natural...

12
Experimental
69 INK-USC/ER-Test

Code for ER-Test, accepted to the Findings of EMNLP 2022

12
Experimental
70 putmanmodel/putman-model-paper

Preprint + pseudocode for the PUTMAN Model (relational meaning graphs,...

11
Experimental
71 HKUST-KnowComp/atomic-conceptualization

Code and data for the paper Acquiring and Modelling Abstract Commonsense...

11
Experimental
72 IndexFziQ/IIE-NLP-Eyas-SemEval2021

Code of IIE-NLP-Eyas Team for ReCAM (Task 4) @SemEval2021...

11
Experimental
73 Nativeatom/PRESQUE

The repository for "Pragmatic Reasoning Unlocks Quantifier Semantics for...

11
Experimental
74 dyan-dy/Baidu-LIC2021-MRC

models and codes for baiduAI LIC 2021 MRC tasks, based on paddlenlp

10
Experimental