google-research-datasets/wit
WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.
Archived1,101 stars. No commits in the last 6 months.
Stars
1,101
Forks
46
Language
—
License
—
Category
Last pushed
Sep 27, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/google-research-datasets/wit"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
facebookresearch/fairseq2
FAIR Sequence Modeling Toolkit 2
OpenNMT/OpenNMT-tf
Neural machine translation and sequence learning using TensorFlow
lhotse-speech/lhotse
Tools for handling multimodal data in machine learning projects.
awslabs/sockeye
Sequence-to-sequence framework with a focus on Neural Machine Translation based on PyTorch
google/sequence-layers
A neural network layer API and library for sequence modeling, designed for easy creation of...