adbar/German-NLP
Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German
**Technical Summary:** Organizes German NLP resources across linguistic processing layers—from text corpora (general-purpose, historical, specialized) through tokenization, lemmatization, POS-tagging, and syntactic parsing, to semantic analysis (embeddings, sentiment, coreference) and speech/translation tasks. Emphasizes immediately usable, actively maintained tools integrated with frameworks like spaCy and transformer models (BERT variants), alongside annotated datasets and treebanks supporting diverse German linguistic phenomena. Covers specialized domains (legal, parliamentary, medical) and dialectal variants (Swiss German, historical periods 750-present) with explicit focus on practical deployability over academic completeness.
518 stars. No commits in the last 6 months.
Stars
518
Forks
66
Language
—
License
—
Category
Last pushed
Oct 30, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/adbar/German-NLP"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Helsinki-NLP/OpusFilter
OpusFilter - Parallel corpus processing toolkit
natasha/corus
Links to Russian corpora + Python functions for loading and parsing
SergeyShk/ruTS
Библиотека для извлечения статистик из текстов на русском языке.
natasha/nerus
Large silver standart Russian corpus with NER, morphology and syntax markup
darija-open-dataset/dataset
darija <-> english dataset