davide-ghidelli-business/OpenCorpus
OpenCorpus is a collection of open-source textual corpora from various languages, designed for easy access and linguistic research. Explore curated datasets with rich metadata and links to valuable resources, all available under public domain licenses. ππ
Stars
1
Forks
—
Language
—
License
CC0-1.0
Category
Last pushed
Mar 18, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/davide-ghidelli-business/OpenCorpus"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Helsinki-NLP/OpusFilter
OpusFilter - Parallel corpus processing toolkit
natasha/corus
Links to Russian corpora + Python functions for loading and parsing
SergeyShk/ruTS
ΠΠΈΠ±Π»ΠΈΠΎΡΠ΅ΠΊΠ° Π΄Π»Ρ ΠΈΠ·Π²Π»Π΅ΡΠ΅Π½ΠΈΡ ΡΡΠ°ΡΠΈΡΡΠΈΠΊ ΠΈΠ· ΡΠ΅ΠΊΡΡΠΎΠ² Π½Π° ΡΡΡΡΠΊΠΎΠΌ ΡΠ·ΡΠΊΠ΅.
natasha/nerus
Large silver standart Russian corpus with NER, morphology and syntax markup
darija-open-dataset/dataset
darija <-> english dataset