JohnSnowLabs/spark-nlp
State of the Art Natural Language Processing
Builds on Apache Spark for distributed NLP at scale, supporting 100,000+ pretrained pipelines and models across 200+ languages. Enables transformer architectures (BERT, RoBERTa, GPT-2, Llama, etc.) natively on JVM ecosystems (Java, Scala, Kotlin) while supporting model imports from TensorFlow, ONNX, OpenVINO, and GGUF formats. Covers end-to-end tasks including tokenization, embeddings, NER, machine translation, question answering, image captioning, and speech recognition.
4,116 stars. Actively maintained with 19 commits in the last 30 days.
Stars
4,116
Forks
739
Language
Scala
License
Apache-2.0
Category
Last pushed
Mar 12, 2026
Commits (30d)
19
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/JohnSnowLabs/spark-nlp"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related tools
JohnSnowLabs/nlu
1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and...
dipanjanS/nlp_workshop_odsc_europe20
Extensive tutorials for the Advanced NLP Workshop in Open Data Science Conference Europe 2020....
aaBadri/nlp-papers
Must-read papers on Natural Language Processing (NLP)
jairNeto/warren_buffet_letters
Repository using NLP techniques such as Transformers, Frequency analysis, document similarity at...
DmitryRyumin/EMNLP-2023-Papers
EMNLP 2023 Papers: Explore cutting-edge research from EMNLP 2023, the premier conference for...