DigitalPebble/behemoth

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.

Archived
48
/ 100
Emerging

Built on MapReduce, it provides a modular annotation framework for chaining document processors (Tika, UIMA, GATE, language identification) and connectors for ingesting from WARC/Nutch sources and exporting to SOLR/Mahout. Acts as distributed glueware orchestrating existing NLP and ML tools at scale rather than implementing its own algorithms, leveraging Hadoop's fault tolerance and horizontal scalability.

284 stars. No commits in the last 6 months.

Archived Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 22 / 25

How are scores calculated?

Stars

284

Forks

59

Language

Java

License

Last pushed

Apr 25, 2018

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/DigitalPebble/behemoth"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.