google-research-datasets/swim-ir
SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 languages, generated using PaLM 2 and summarize-then-ask prompting.
ArchivedNo commits in the last 6 months.
Stars
49
Forks
2
Language
—
License
—
Category
Last pushed
Nov 13, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/google-research-datasets/swim-ir"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ibm-self-serve-assets/Watson-NLP
This collection demonstrates how to help you to quickly embed Watson NLP in your own applications.
psychbruce/PsychWordVec
🔜 Integrative Toolbox of Word Embedding Research for Psychological Science.
mobassir94/Multilingual-NLP-for-Islamic-Theology
Cross Lingual Language models for making search engines for Holy Quran and Sahih Hadiths