JohnSnowLabs/spark-nlp

State of the Art Natural Language Processing

70
/ 100
Verified

Builds on Apache Spark for distributed NLP at scale, supporting 100,000+ pretrained pipelines and models across 200+ languages. Enables transformer architectures (BERT, RoBERTa, GPT-2, Llama, etc.) natively on JVM ecosystems (Java, Scala, Kotlin) while supporting model imports from TensorFlow, ONNX, OpenVINO, and GGUF formats. Covers end-to-end tasks including tokenization, embeddings, NER, machine translation, question answering, image captioning, and speech recognition.

4,116 stars. Actively maintained with 19 commits in the last 30 days.

No Package No Dependents
Maintenance 20 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 24 / 25

How are scores calculated?

Stars

4,116

Forks

739

Language

Scala

License

Apache-2.0

Last pushed

Mar 12, 2026

Commits (30d)

19

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/JohnSnowLabs/spark-nlp"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.