caojie54/OTSeq2Set
OTSeq2Set, XMTC
This project helps categorize text documents into a very large number of relevant topics or labels, which is known as Extreme Multi-label Text Classification (XMTC). You provide the system with a collection of text documents and a vast vocabulary of possible labels, and it outputs the most appropriate labels for each document. This is useful for anyone needing to automatically organize or tag large text datasets, like legal professionals classifying documents, e-commerce managers tagging product descriptions, or content curators categorizing articles.
No commits in the last 6 months.
Use this if you need to assign multiple specific tags or categories from an extremely large list to individual text documents.
Not ideal if you're dealing with a small, fixed number of categories or if your text classification needs are simple.
Stars
11
Forks
—
Language
Python
License
—
Category
Last pushed
Dec 31, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/caojie54/OTSeq2Set"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ryanjgallagher/shifterator
Interpretable data visualizations for understanding how texts differ at the word level
HLasse/TextDescriptives
A Python library for calculating a large variety of metrics from text
jboynyc/textnets
Text analysis with networks.
DemetersSon83/Quantitative-Discursive-Analysis
A tool for quantitatively measuring discursive similarity between bodies of text.
sciknoworg/tib-sid
TIB-SID: A bilingual (English/German) dataset of library catalog records with GND subject...