angelosalatino/cso-classifier
Python library that classifies content from scientific papers with the topics of the Computer Science Ontology (CSO).
Employs a three-stage pipeline combining syntactic matching (direct CSO concept extraction), semantic inference via word embeddings and part-of-speech tagging, and post-processing with outlier removal and hierarchical enhancement. Supports both single-paper and batch classification modes with optional filtering by CSO concept categories. Integrates spaCy for NLP, NLTK for preprocessing, and leverages pre-computed word2vec embeddings alongside cached semantic mappings for efficient inference.
Available on PyPI.
Stars
95
Forks
19
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Monthly downloads
243
Commits (30d)
0
Dependencies
12
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/angelosalatino/cso-classifier"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
giuseppebonaccorso/Reuters-21578-Classification
Text classification with Reuters-21578 datasets using Gensim Word2Vec and Keras LSTM
tblock/10kGNAD
Ten Thousand German News Articles Dataset for Topic Classification
NirantK/Hinglish
Hinglish Text Classification
yassersouri/classify-text
"20 Newsgroups" text classification with python
newsgac/platform
Platform for machine learning experiments developed in the project NEWSGAC