CeON/CERMINE
Content ExtRactor and MINEr
Extracts metadata, full text, and parsed references from academic PDFs using machine learning models (CRF-based parsers), outputting results in NLM JATS format. Available as a standalone JAR, Maven library, or REST API, supporting batch processing of documents and granular extraction of specific components like reference strings and affiliation data. Built in Java with learned models for document structure analysis and information extraction from scholarly publications.
513 stars. No commits in the last 6 months.
Stars
513
Forks
99
Language
Java
License
AGPL-3.0
Category
Last pushed
Jun 30, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/CeON/CERMINE"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
kermitt2/entity-fishing
A machine learning tool for fishing entities
vinhkhuc/JFastText
Java interface for fastText
rosette-api/java
Babel Street Analytics Client Library for Java
vinhkhuc/jcrfsuite
Java interface for CRFsuite: http://www.chokkan.org/software/crfsuite/
TechPrimers/core-nlp-example
Natural Language Processing Example using Stanford's Core NLP Java Library