superdoc-dev/docx-corpus
The largest open corpus of classified docx documents
35
/ 100
Emerging
No Package
No Dependents
Maintenance
13 / 25
Adoption
8 / 25
Maturity
11 / 25
Community
3 / 25
Stars
45
Forks
1
Language
TypeScript
License
MIT
Category
Last pushed
Mar 12, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/superdoc-dev/docx-corpus"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Tiiiger/bert_score
BERT score for text generation
71
DerwenAI/pytextrank
Python implementation of TextRank algorithms ("textgraphs") for phrase extraction
69
BrikerMan/Kashgari
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for...
64
asyml/texar
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. ...
63
yohasebe/wp2txt
A command-line tool to extract plain text from Wikipedia dumps with category and section filtering
57