angelosalatino/cso-classifier

Python library that classifies content from scientific papers with the topics of the Computer Science Ontology (CSO).

71
/ 100
Verified

Employs a three-stage pipeline combining syntactic matching (direct CSO concept extraction), semantic inference via word embeddings and part-of-speech tagging, and post-processing with outlier removal and hierarchical enhancement. Supports both single-paper and batch classification modes with optional filtering by CSO concept categories. Integrates spaCy for NLP, NLTK for preprocessing, and leverages pre-computed word2vec embeddings alongside cached semantic mappings for efficient inference.

Available on PyPI.

Maintenance 13 / 25
Adoption 14 / 25
Maturity 25 / 25
Community 19 / 25

How are scores calculated?

Stars

95

Forks

19

Language

Python

License

Apache-2.0

Last pushed

Mar 13, 2026

Monthly downloads

243

Commits (30d)

0

Dependencies

12

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/angelosalatino/cso-classifier"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.