grobidOrg/grobid
A machine learning software for extracting information from scholarly documents
Extracts fine-grained metadata from PDFs using conditional random fields (CRF) with optional deep learning models via DeLFT, outputting structured XML/TEI with 68 semantic labels spanning bibliographic data, full-text structures, and bounding box coordinates. Integrates with CrossRef and biblio-glutton for reference consolidation, supports batch processing and REST APIs, and scales across production deployments at ResearchGate, Semantic Scholar, and Internet Archive Scholar.
4,703 stars. Actively maintained with 22 commits in the last 30 days.
Stars
4,703
Forks
538
Language
Java
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
22
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/grobidOrg/grobid"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
obss/jury
Comprehensive NLP Evaluation System
lihanghang/NLP-Knowledge-Graph
自然语言处理、知识图谱、对话系统,大模型等技术研究与应用。
yzhangcs/parser
:rocket: State-of-the-art parsers for natural language.
alibaba/EasyNLP
EasyNLP: A Comprehensive and Easy-to-use NLP Toolkit
octanove/janlpbook-code
Public code for the book "Introduction to Japanese Natural Language Processing"