qyfang/TextClassification

基于scikit-learn实现对新浪新闻的文本分类,数据集为100w篇文档,总计10类,测试集与训练集1:1划分。分类算法采用SVM和Bayes,其中Bayes作为baseline。

36
/ 100
Emerging

Implements a complete text classification pipeline using jieba tokenization, TF-IDF weighting, and parallel processing via process pools to accelerate both web scraping and preprocessing across the 1-million-document dataset. LinearSVC with grid search and 5-fold cross-validation achieves 0.90 average F1-score, significantly outperforming the Naive Bayes baseline (0.79), with visualization tools comparing classifier performance via confusion matrices and performance histograms.

110 stars. No commits in the last 6 months.

No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 8 / 25
Community 19 / 25

How are scores calculated?

Stars

110

Forks

21

Language

Python

License

Last pushed

Dec 24, 2018

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/qyfang/TextClassification"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.