grobidOrg/grobid

A machine learning software for extracting information from scholarly documents

70
/ 100
Verified

Extracts fine-grained metadata from PDFs using conditional random fields (CRF) with optional deep learning models via DeLFT, outputting structured XML/TEI with 68 semantic labels spanning bibliographic data, full-text structures, and bounding box coordinates. Integrates with CrossRef and biblio-glutton for reference consolidation, supports batch processing and REST APIs, and scales across production deployments at ResearchGate, Semantic Scholar, and Internet Archive Scholar.

4,703 stars. Actively maintained with 22 commits in the last 30 days.

No Package No Dependents
Maintenance 23 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 21 / 25

How are scores calculated?

Stars

4,703

Forks

538

Language

Java

License

Apache-2.0

Last pushed

Mar 13, 2026

Commits (30d)

22

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/grobidOrg/grobid"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.