google-research-datasets/wit
WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.
Archived1,101 stars. No commits in the last 6 months.
Stars
1,101
Forks
46
Language
—
License
—
Category
Last pushed
Sep 27, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/google-research-datasets/wit"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
nlp-uoregon/trankit
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
UBC-NLP/turjuman
TURJUMAN, a neural toolkit for translating from 20 languages into Modern Standard Arabic (MSA).
sagorbrur/codeswitch
CodeSwitch is a NLP tool, can use for language identification, pos tagging, name entity...
nusnlp/esc
The official code of the "Frustratingly Easy System Combination for Grammatical Error Correction" paper
nusnlp/greco
The official code for the "System Combination via Quality Estimation for Grammatical Error...