NLP Resource Collections ML Frameworks

Curated lists, datasets, and reference materials for Natural Language Processing across languages and domains. Does NOT include implementations of NLP models, tutorials, or frameworks—only aggregated resources and paper collections.

There are 17 nlp resource collections frameworks tracked. 1 score above 50 (established tier). The highest-rated is leomaurodesenv/game-datasets at 52/100 with 1,014 stars. 1 of the top 10 are actively maintained.

Get all 17 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=nlp-resource-collections&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Framework Score Tier
1 leomaurodesenv/game-datasets

:video_game: A curated list of awesome game datasets, and tools to...

52
Established
2 jonathanwvd/awesome-industrial-datasets

A curated collection of public industrial datasets.

48
Emerging
3 maastrichtlawtech/awesome-legal-nlp

📖 A curated list of LegalNLP resources from all around the web.

42
Emerging
4 NTMC-Community/awesome-neural-models-for-semantic-match

A curated list of papers dedicated to neural text (semantic) matching.

41
Emerging
5 jsbroks/awesome-dataset-tools

🔧 A curated list of awesome dataset tools

41
Emerging
6 haiker2011/awesome-nlp-sentiment-analysis

:book: 收集NLP领域相关的数据集、论文、开源实现,尤其是情感分析、情绪原因识别、评价对象和评价词抽取方面。

40
Emerging
7 Jamie-Cui/paper-pulse

Automatically fetch, filter, and summarize research papers from arXiv & IACR...

39
Emerging
8 ml4code/ml4code.github.io

Website for "A Survey of Machine Learning for Big Code and Naturalness"

34
Emerging
9 Huffon/NLP101

NLP 101: a resource repository for Deep Learning and Natural Language Processing

30
Emerging
10 coteries/cedille-ai

✒️ Cedille is a large French language model (6B), released under an...

29
Experimental
11 vandroogenbroeckmarc/doi2bib

Tool to convert a DOI to a BiBTeX entry (mainly "adapted" for the computer...

26
Experimental
12 MEgooneh/awesome-Iran-datasets

Iranian/Persian Datasets. دیتاست‌های فارسی و ایرانی

26
Experimental
13 enochkan/awesome-gans-and-deepfakes

A curated list of GAN & Deepfake papers and repositories.

26
Experimental
14 tushartushar/ML4SCA

Machine Learning for Source Code Analysis

24
Experimental
15 sciknoworg/ald-ale-orkg-review

The repository contains code to automate extraction of review tables from...

19
Experimental
16 bdqnghi/awesome-ai4code

A collection of recent papers, benchmarks and datasets of AI4Code domain.

17
Experimental
17 nlx-group/study-of-commonsense-reasoning

Code and data for Masters Dissertation "A Study of Commonsense Reasoning...

10
Experimental