CeON/CERMINE

Content ExtRactor and MINEr

50
/ 100
Established

Extracts metadata, full text, and parsed references from academic PDFs using machine learning models (CRF-based parsers), outputting results in NLM JATS format. Available as a standalone JAR, Maven library, or REST API, supporting batch processing of documents and granular extraction of specific components like reference strings and affiliation data. Built in Java with learned models for document structure analysis and information extraction from scholarly publications.

513 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 24 / 25

How are scores calculated?

Stars

513

Forks

99

Language

Java

License

AGPL-3.0

Last pushed

Jun 30, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/CeON/CERMINE"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.