CeON/CERMINE

Content ExtRactor and MINEr

/ 100

Established

Extracts metadata, full text, and parsed references from academic PDFs using machine learning models (CRF-based parsers), outputting results in NLM JATS format. Available as a standalone JAR, Maven library, or REST API, supporting batch processing of documents and granular extraction of specific components like reference strings and affiliation data. Built in Java with learned models for document structure analysis and information extraction from scholarly publications.

513 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 24 / 25

How are scores calculated?

Stars

513

Forks

Language

Java

License

AGPL-3.0

Related tools

kermitt2/entity-fishing

A machine learning tool for fishing entities

vinhkhuc/JFastText

Java interface for fastText

rosette-api/java

Babel Street Analytics Client Library for Java

vinhkhuc/jcrfsuite

Java interface for CRFsuite: http://www.chokkan.org/software/crfsuite/

TechPrimers/core-nlp-example

Natural Language Processing Example using Stanford's Core NLP Java Library

Explore NLP Tools

All categories Trending NLP directory Insights