qyfang/TextClassification

基于scikit-learn实现对新浪新闻的文本分类，数据集为100w篇文档，总计10类，测试集与训练集1:1划分。分类算法采用SVM和Bayes，其中Bayes作为baseline。

/ 100

Emerging

Implements a complete text classification pipeline using jieba tokenization, TF-IDF weighting, and parallel processing via process pools to accelerate both web scraping and preprocessing across the 1-million-document dataset. LinearSVC with grid search and 5-fold cross-validation achieves 0.90 average F1-score, significantly outperforming the Naive Bayes baseline (0.79), with visualization tools comparing classifier performance via confusion matrices and performance histograms.

110 stars. No commits in the last 6 months.

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 8 / 25

Community 19 / 25

How are scores calculated?

Stars

110

Forks

Language

Python

License

—

Higher-rated alternatives

hankcs/text-classification-svm

The missing SVM-based text classification module implementing HanLP's interface

derhuerst/nbayes

A Naive Bayes classifier written in JavaScript.

ningchaoar/UnsupervisedTextClassification

基于关键词的无监督文本分类；Implementation for paper "Text Classification by Bootstrapping with Keywords, EM...

fullstackyang/article-classifier

基于朴素贝叶斯实现的一款微信公众号文章分类器

mustafaturan/omnicat-bayes

Naive Bayes text classification implementation as an OmniCat classifier strategy. (#ruby #naivebayes)

Explore NLP Tools

All categories Trending NLP directory Insights