Structured Data Inference NLP Tools

Datasets and benchmarks for NLI, table understanding, text-to-SQL, and instruction-following tasks involving structured or semi-structured data. Does NOT include general sentiment analysis, classification tasks without structured reasoning components, or commonsense knowledge resources without explicit inference evaluation.

There are 74 structured data inference tools tracked. The highest-rated is ymcui/cmrc2018 at 42/100 with 451 stars.

Get all 74 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=structured-data-inference&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	ymcui/cmrc2018 A Span-Extraction Dataset for Chinese Machine Reading Comprehension (CMRC 2018)	42	Emerging	451	Python
2	princeton-nlp/DensePhrases [ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021:...	38	Emerging	606	Python
3	thunlp/MultiRD Code and data of the AAAI-20 paper "Multi-channel Reverse Dictionary Model"	38	Emerging	111	Python
4	IndexFziQ/KMRC-Papers A list of recent papers regarding knowledge-based machine reading comprehension.	35	Emerging	42	—
5	danqi/rc-cnn-dailymail CNN/Daily Mail Reading Comprehension Task	33	Emerging	292	Python
6	declare-lab/CIDER This repository contains the dataset and the pytorch implementations of the...	32	Emerging	27	Python
7	maastrichtlawtech/gdsr 🕸️ A graph-augmented dense statute retriever. (EACL 2023)	32	Emerging	25	Python
8	intfloat/SimKGC ACL 2022, SimKGC: Simple Contrastive Knowledge Graph Completion with...	32	Emerging	213	Python
9	zjunlp/MKG_Analogy [ICLR 2023] Multimodal Analogical Reasoning over Knowledge Graphs	32	Emerging	132	Python
10	ShiZhengyan/StepGame [AAAI 2022] Dataset and pytorch codes for the paper titled "StepGame: A New...	32	Emerging	32	Python
11	shmsw25/AmbigQA An original implementation of EMNLP 2020, "AmbigQA: Answering Ambiguous...	31	Emerging	121	Python
12	GeekDream-x/IDOL Repo for paper "IDOL: Indicator-oriented Logic Pre-training for Logical...	30	Emerging	22	Python
13	IndexFziQ/MSMARCO-MRC-Analysis Analysis on the MS-MARCO leaderboard regarding the machine reading...	30	Emerging	21	—
14	utahnlp/knowledge_infotabs Repository containing code for the NAACL 2021 paper (Incorporating External...	30	Emerging	17	Python
15	yuweihao/reclor Code for "ReClor: A Reading Comprehension Dataset Requiring Logical...	29	Experimental	83	Python
16	XingLuxi/KMRC-Research-Archive 🗂 Research about Knowledge-based Machine Reading Comprehension	28	Experimental	24	—
17	phanxuanphucnd/Active-learning-in-NLP Active learning in NLP	28	Experimental	14	Python
18	FeiWang96/GTR [SIGIR 2021] Retrieving Complex Tables with Multi-Granular Graph...	27	Experimental	48	Python
19	amazon-science/pizza-semantic-parsing-dataset The PIZZA dataset continues the exploration of task-oriented parsing by...	27	Experimental	20	Python
20	anshitag/memit_csk Source repository for Editing Common Sense in Transformers (EMNLP 2023)	27	Experimental	6	Python
21	webis-de/acl22-revisiting-uncertainty-based-query-strategies-for-active-learning-with-transformers Revisiting Uncertainty-based Query Strategies for Active Learning with Transformers	27	Experimental	4	Python
22	marceljahnke/negative-cache PyTorch Implementation of the Paper "Efficient Training of Retrieval Models...	26	Experimental	7	Python
23	amazon-science/wqa-multi-sentence-inference This repository contains code used for our Multi Sentence Inference NAACL'22 paper.	25	Experimental	12	Python
24	ymcui/expmrc ExpMRC: Explainability Evaluation for Machine Reading Comprehension	25	Experimental	62	Python
25	sherlcok314159/ChineseMRC-Data 收集了目前为止中文领域的MRC抽取式数据集	25	Experimental	122	—
26	thunlp/CokeBERT CokeBERT: Contextual Knowledge Selection and Embedding towards Enhanced...	25	Experimental	31	Python
27	acidAnn/semeval2022_task7_starter_kit :bulb: Starter kit for SemEval 2022 Task 7: Identifying Plausible...	25	Experimental	4	Python
28	USSiamaboat/polytuplet-loss A Reverse Approach to Training Reading Comprehension and Logical Reasoning Models	24	Experimental	3	Python
29	humanlab/rare-class-AL AL for rare class strategies compared in the paper "Transfer and Active...	24	Experimental	4	Python
30	ict-bigdatalab/CorpusBrain CIKM 2022: CorpusBrain: Pre-train a Generative Retrieval Model for...	24	Experimental	34	Python
31	ai-systems/tg2022task_premise_retrieval TextGraphs Shared Task on Natural Language Premise Selection	24	Experimental	4	Python
32	Jordy-VL/uncertainty-bench Code repository for **Benchmarking Scalable Predictive Uncertainty in Text...	24	Experimental	4	Jupyter Notebook
33	semeval-2026-kclarity/clarity Code release for KCLarity at SemEval-2026 Task 6: Encoder and Zero-Shot...	24	Experimental	2	Python
34	Dibyakanti/AutoTNLI-code This repository contains the official code for the paper : Realistic Data...	23	Experimental	6	HTML
35	testzer0/AmbiQT Code and Assets for "Benchmarking and Improving Text-to-SQL Generation Under...	22	Experimental	9	Python
36	psunlpgroup/XSemPLR Data and code for ACL 2023 paper XSemPLR: Cross-Lingual Semantic Parsing in...	22	Experimental	9	Shell
37	ZeinabAghahadi/Syllogistic-Commonsense-Reasoning Deductive Commonsense Reasoning	21	Experimental	8	Jupyter Notebook
38	pietrolesci/anchoral This is the official PyTorch implementation for our NAACL 2024 paper:...	21	Experimental	22	Python
39	krystalan/Multi-hopRC :notebook_with_decorative_cover: notes for Multi-hop Reading Comprehension...	21	Experimental	90	—
40	minnesotanlp/infoVerse Jaehyung Kim et al's ACL 2023 paper on "infoVerse: A Universal Framework for...	20	Experimental	16	Python
41	Pzoom522/xANLG Data and code for "Understanding Linearity of Cross-Lingual Word Embedding...	20	Experimental	12	Python
42	cognitiveailab/tg2021task Participant Kit for the TextGraphs-15 Shared Task on Explanation Regeneration	20	Experimental	19	Python
43	INK-USC/RiddleSense RiddleSense: Reasoning about Riddle Questions Featuring Linguistic...	20	Experimental	13	Python
44	phosseini/GisPy GisPy: A Tool for Measuring Gist Inference Score in Text...	20	Experimental	13	Assembly
45	THU-KEG/COPEN The official code and dataset for EMNLP 2022 paper "COPEN: Probing...	19	Experimental	21	Python
46	MultimodalGeo/GeoText-1652 An offical repo for ECCV 2024 Towards Natural Language-Guided Drones:...	19	Experimental	114	Python
47	ZhengZixiang/MRCPapers Worth-reading paper list and other awesome resources on Machine Reading...	18	Experimental	27	—
48	mariomeissner/AmbiNLI This is the code for the paper "Embracing Ambiguity: Shifting the Training...	17	Experimental	5	Jupyter Notebook
49	yul091/UnBED Codebase for the ACL 2023 paper: "Uncertainty-Aware Bootstrap Learning for...	17	Experimental	5	Python
50	MSR-LIT/Splash Release of SPLASH: Dataset for semantic parse correction with natural...	17	Experimental	42	—
51	rycolab/evidence-probing Code and data for the ACL 2022 paper "Probing as Quantifying Inductive Bias".	16	Experimental	3	Python
52	royxlead/self-diagnosing-neural-models-python Self-Diagnosing Neural Networks: models that quantify their own uncertainty...	16	Experimental	1	Jupyter Notebook
53	Advancing-Machine-Human-Reasoning-Lab/transformer-psychometrics Code to reproduce experiments in our *SEM 2021 Paper	15	Experimental	2	Python
54	maastrichtlawtech/fusion 🔗 Hybrid retrieval in the legal domain	14	Experimental	10	Python
55	salesforce/FewXC Official code and data release for Efficiently Aligned Cross-Lingual...	14	Experimental	3	Python
56	Raising-hrx/MetGen An implementation for MetGen: A Module-Based Entailment Tree Generation...	14	Experimental	13	Python
57	naver/ms-marco-shift A Fine-Grained Analysis of Distribution Shifts in MSMARCO (MS-Shift)....	13	Experimental	6	Jupyter Notebook
58	LaVi-Lab/C2LEVA [Findings of ACL 2025] "C2LEVA: Toward Comprehensive and Contamination-Free...	13	Experimental	2	—
59	Nativeatom/FRoG Fuzzy reasoning of Generalized Quantifiers (EMNLP 2024)	13	Experimental	8	Python
60	megagonlabs/ambignlg :dog: Data for AmbigNLG: Addressing Task Ambiguity in Instruction for NLG...	13	Experimental	6	Python
61	fajri91/discourse_probing Discourse Probing of Pretrained Language Models. In Proceedings of NAACL 2021.	13	Experimental	10	Jupyter Notebook
62	nlp-waseda/dcsg-ja Dialogue Commonsense Graph in Japanese	13	Experimental	6	—
63	megagonlabs/xatu 🕊️ Code and Data for XATU: A Fine-grained Instruction-based Benchmark for...	13	Experimental	6	Python
64	collapseindex/ci-curation CI-Guided Data Curation: Using prediction instability to detect label noise....	12	Experimental	1	Jupyter Notebook
65	gianluigilopardo/anchors_text_theory Code for the paper "A Sea of Words: An In-Depth Analysis of Anchors for Text...	12	Experimental	14	Python
66	amazon-science/resource-constrained-naturalized-semantic-parsing This repository is made public for reproducibility of our recent work on...	12	Experimental	3	—
67	zhengyima/Anchors Source code of CIKM2021 Paper 'Pre-training for Ad-hoc Retrieval: Hyperlink...	12	Experimental	16	Python
68	XInfoTabS/dataset The Official dataset for "XINFOTABS: Evaluating Multilingual Tabular Natural...	12	Experimental	3	Python
69	INK-USC/ER-Test Code for ER-Test, accepted to the Findings of EMNLP 2022	12	Experimental	3	Python
70	putmanmodel/putman-model-paper Preprint + pseudocode for the PUTMAN Model (relational meaning graphs,...	11	Experimental	—	—
71	HKUST-KnowComp/atomic-conceptualization Code and data for the paper Acquiring and Modelling Abstract Commonsense...	11	Experimental	23	Python
72	IndexFziQ/IIE-NLP-Eyas-SemEval2021 Code of IIE-NLP-Eyas Team for ReCAM (Task 4) @SemEval2021...	11	Experimental	2	Python
73	Nativeatom/PRESQUE The repository for "Pragmatic Reasoning Unlocks Quantifier Semantics for...	11	Experimental	2	Python
74	dyan-dy/Baidu-LIC2021-MRC models and codes for baiduAI LIC 2021 MRC tasks, based on paddlenlp	10	Experimental	1	Python

Comparisons in this category

KMRC-Papers and KMRC-Research-Archive (35 vs 28)