Chinese NLP Toolkits NLP Tools

Comprehensive NLP toolkits and frameworks specifically designed for Chinese language processing, including segmentation, POS tagging, NER, sentiment analysis, and classical Chinese support. Does NOT include language-agnostic NLP tools, machine translation systems, or tools focused on non-Chinese languages.

There are 73 chinese nlp toolkits tools tracked. 3 score above 70 (verified tier). The highest-rated is PyThaiNLP/pythainlp at 93/100 with 1,117 stars and 1,203,313 monthly downloads. 2 of the top 10 are actively maintained.

Get all 73 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=chinese-nlp-toolkits&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 PyThaiNLP/pythainlp

Thai natural language processing in Python

93
Verified
2 hankcs/HanLP

Natural Language Processing for the next decade. Tokenization,...

77
Verified
3 dongrixinyu/JioNLP

中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package...

70
Verified
4 hankcs/pyhanlp

中文分词

68
Established
5 jacksonllee/pycantonese

Cantonese Linguistics and NLP

67
Established
6 go-ego/gse

Go efficient multilingual NLP and text segmentation; support English,...

65
Established
7 baidu/lac

百度NLP:分词,词性标注,命名实体识别,词重要性

65
Established
8 ownthink/Jiagu

Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类

64
Established
9 SeanLee97/xmnlp

xmnlp:提供中文分词, 词性标注, 命名体识别,情感分析,文本纠错,文本转拼音,文本摘要,偏旁部首,句子表征及文本相似度计算等功能

56
Established
10 yongzhuo/Macropodus

自然语言处理工具Macropodus,基于Albert+BiLSTM+CRF深度学习网络架构,中文分词,词性标注,命名实体识别,新词发现,关键词,文本摘要...

54
Established
11 NLPchina/ansj_seg

ansj分词.ict的真正java实现.分词效果速度都超过开源版的ict. 中文分词,人名识别,词性标注,用户自定义词典

51
Established
12 linonetwo/segmentit

任何 JS 环境可用的中文分词包,fork from leizongmin/node-segment

50
Established
13 messense/jieba-rs

The Jieba Chinese Word Segmentation Implemented in Rust

50
Established
14 jiaeyan/Jiayan

甲言,专注于古代汉语(古汉语/古文/文言文/文言)处理的NLP工具包,支持文言词库构建、分词、词性标注、断句和标点。Jiayan, the 1st...

50
Established
15 monpa-team/monpa

MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型

48
Emerging
16 OpenPecha/pybo

🦜 NLP for Tibetan, in Python.

47
Emerging
17 lionsoul2014/jcseg

Jcseg is a light weight NLP framework developed with Java. Provide CJK and...

44
Emerging
18 suminb/hanja

한글, 한자 라이브러리

43
Emerging
19 yaoguangluo/Deta_Parser

快速中文分词分析word segmentation

43
Emerging
20 hankcs/hanlp-lucene-plugin

HanLP中文分词Lucene插件,支持包括Solr在内的基于Lucene的系统

43
Emerging
21 XiaoMi/MiNLP

XiaoMi Natural Language Processing Toolkits

43
Emerging
22 qinwf/jiebaR

Chinese text segmentation with R. R语言中文分词 (文档已更新 🎉...

43
Emerging
23 notoriouslab/trad-zh-search

trad-zh-search 可單獨搭配主流搜尋引擎,專門給繁體中文使用的繁體中文文本預處理工具 —— CKIP 分詞 + bigram 索引生成,附可選擇的領域字典系統

43
Emerging
24 hankcs/multi-criteria-cws

Simple Solution for Multi-Criteria Chinese Word Segmentation

42
Emerging
25 jimichan/mynlp

一个生产级、高性能、模块化、可扩展的中文NLP工具包。(中文分词、平均感知机、fastText、拼音、新词发现、分词纠错、BM25、人名识别、命名实体、自定义词典)

42
Emerging
26 smoothnlp/SmoothNLP

专注于可解释的NLP技术 An NLP Toolset With A Focus on Explainable Inference

42
Emerging
27 houbb/opencc4j

🇨🇳Open Chinese Convert is an opensource project for conversion between...

41
Emerging
28 KoichiYasuoka/UD-Kanbun

Tokenizer POS-tagger and Dependency-parser for Classical Chinese

41
Emerging
29 supercoderhawk/DeepLearning_NLP

基于深度学习的自然语言处理库

40
Emerging
30 hankcs/ID-CNN-CWS

Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese...

40
Emerging
31 notAI-tech/deepsegment

A sentence segmenter that actually works!

40
Emerging
32 kirklin/go-swd

Sensitive Words Detection 一个高性能的敏感词检测和过滤库,基于 Go...

39
Emerging
33 houbb/segment

The jieba-analysis tool for java.(基于结巴分词词库实现的更加灵活优雅易用,高性能的 java 分词实现。支持词性标注。)

38
Emerging
34 StarCC0/starcc-py

简繁转换 簡繁轉換 Python implementation of StarCC, the next generation of...

37
Emerging
35 houbb/pinyin

The high performance pinyin tool for java.(java 高性能中文转拼音工具。支持同音字。)

37
Emerging
36 houbb/nlp-hanzi-similar

The hanzi similar tool.(汉字相似度计算工具,中文形近字算法。可用于手写汉字识别纠正,文本混淆等。)

37
Emerging
37 junchaoIU/QCNLP

A Preprocessing & Parsing tool for Chinese Natural Language (一个高效的中文预处理与自然语言处理解析工具)

36
Emerging
38 mxcoras/jieba-next

Use Rust to Speed up jieba 高效、现代的中文分词库

33
Emerging
39 google/budou

Budou is an automatic organizer tool for beautiful line breaking in CJK...

33
Emerging
40 KoichiYasuoka/SuPar-Kanbun

Tokenizer POS-tagger and Dependency-parser for Classical Chinese

33
Emerging
41 cyd622/nlp-jieba

结巴中文分词(PHP 版本):做最好的 PHP 中文分词、中文断词组件

31
Emerging
42 KoichiYasuoka/GuwenCOMBO

Tokenizer POS-tagger and Dependency-parser for Classical Chinese

31
Emerging
43 shibing624/crf-seg

crf-seg:用于生产环境的中文分词处理工具,可自定义语料、可自定义模型、架构清晰,分词效果好。java编写。

31
Emerging
44 bububa/jiagu

Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类

30
Emerging
45 wittawatj/jtcc

Java library to tokenize Thai text into a list of TCCs

30
Emerging
46 KoichiYasuoka/SuPar-Kanbun-1.3.4

Tokenizer POS-tagger and Dependency-parser for Classical Chinese

27
Experimental
47 sdq/FenciMac

中文分词 Mac版

27
Experimental
48 supercoderhawk/DeepNLP

基于深度学习的自然语言处理库

27
Experimental
49 jamsinclair/budou-node

Node.js port of Budou, an automatic organizer tool for beautiful line...

25
Experimental
50 jason2506/esapp

An unsupervised Chinese word segmentation tool.

25
Experimental
51 zxgineng/deepnlp

小时候练手的nlp项目

25
Experimental
52 dogterbox/thai-word-segmentation

Thai word segmentation using deep learning

24
Experimental
53 PyThaiNLP/Han-solo

🪿 Han-solo: Thai syllable segmenter

22
Experimental
54 bryanchw/Traditional-Chinese-Stopwords-and-Punctuations-Library

Created a Python library specifically for Traditional Chinese stopwords and...

21
Experimental
55 hope-data-science/chinese_NLP

中文自然语言处理

21
Experimental
56 limchiahooi/nlp-chinese

This repo contains my Natural Language Processing (NLP) in Chinese project.

21
Experimental
57 mathsyouth/awesome-word-segmentation

A curated list of resources dedicated to word segmentation

20
Experimental
58 jsrpy/Chinese-NLP-Jieba

This is an introduction to Chinese words segmentation using Jieba.

20
Experimental
59 cxumol/jieba-wasm-html

Fast Jieba Chinese text segmentation on browser without backend/NPM |...

20
Experimental
60 shibing624/pinyin-tokenizer

pinyintokenizer, 拼音分词器,将连续的拼音切分为单字拼音列表。

20
Experimental
61 wchan757/Cantonese_Word_Segmentation

Dictionary for Cantonese word segmentation

20
Experimental
62 Lapis-Hong/fast-xinci

新词发现 Chinese New Words Finder (c++ library).

19
Experimental
63 gyatso736/-Tibetan-tokenizer-

This Tibetan tokenizer based on Bi-LSTM+CRF methods, it was created with the...

19
Experimental
64 JackHCC/Chinese-Tokenization

利用传统方法(N-gram,HMM等)、神经网络方法(CNN,LSTM等)和预训练方法(Bert等)的中文分词任务实现【The word...

18
Experimental
65 NoHeartPen/Kanji2Hanzi

This Project is used to convert Japanese Kanji to Simplifed Chinese character.

16
Experimental
66 NicoACloutier/Hanzi.jl

A Julia library to romanize Hanzi.

15
Experimental
67 Jyutt/jieba-hs

Jieba中文分詞算法Haskell版本 Haskell Implementation of Jieba Chinese Segmentation Algorithm

13
Experimental
68 yihong-chen/chinese-word-segmentation

Simple chinese word segmentation with experiments on the PKU datatset

13
Experimental
69 wittawatj/ctwt

Classifier-based Thai Word Tokenizer

13
Experimental
70 sinostudy/pinyin

Convert between different representations of Hànyǔ Pīnyīn.

13
Experimental
71 bmwj/Tibetan_information_processing

藏文信息处理工具集(Tibetan_Information_Processing_Toolkit),其功能包含:生成完整的藏文字符集,智能识别藏文字符构件...

11
Experimental
72 Ancastal/HSK-Character-Profiler

HSK Character Profiler is a Python tool that analyzes Chinese character...

10
Experimental
73 StarCC0/starcc0.github.io

简繁转换 簡繁轉換 StarCC is the next generation of Simplified-Traditional Chinese...

10
Experimental

Comparisons in this category