Chinese NLP Toolkits NLP Tools
Comprehensive NLP toolkits and frameworks specifically designed for Chinese language processing, including segmentation, POS tagging, NER, sentiment analysis, and classical Chinese support. Does NOT include language-agnostic NLP tools, machine translation systems, or tools focused on non-Chinese languages.
There are 73 chinese nlp toolkits tools tracked. 3 score above 70 (verified tier). The highest-rated is PyThaiNLP/pythainlp at 93/100 with 1,117 stars and 1,203,313 monthly downloads. 2 of the top 10 are actively maintained.
Get all 73 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=chinese-nlp-toolkits&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
PyThaiNLP/pythainlp
Thai natural language processing in Python |
|
Verified |
| 2 |
hankcs/HanLP
Natural Language Processing for the next decade. Tokenization,... |
|
Verified |
| 3 |
dongrixinyu/JioNLP
中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package... |
|
Verified |
| 4 |
hankcs/pyhanlp
中文分词 |
|
Established |
| 5 |
jacksonllee/pycantonese
Cantonese Linguistics and NLP |
|
Established |
| 6 |
go-ego/gse
Go efficient multilingual NLP and text segmentation; support English,... |
|
Established |
| 7 |
baidu/lac
百度NLP:分词,词性标注,命名实体识别,词重要性 |
|
Established |
| 8 |
ownthink/Jiagu
Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类 |
|
Established |
| 9 |
SeanLee97/xmnlp
xmnlp:提供中文分词, 词性标注, 命名体识别,情感分析,文本纠错,文本转拼音,文本摘要,偏旁部首,句子表征及文本相似度计算等功能 |
|
Established |
| 10 |
yongzhuo/Macropodus
自然语言处理工具Macropodus,基于Albert+BiLSTM+CRF深度学习网络架构,中文分词,词性标注,命名实体识别,新词发现,关键词,文本摘要... |
|
Established |
| 11 |
NLPchina/ansj_seg
ansj分词.ict的真正java实现.分词效果速度都超过开源版的ict. 中文分词,人名识别,词性标注,用户自定义词典 |
|
Established |
| 12 |
linonetwo/segmentit
任何 JS 环境可用的中文分词包,fork from leizongmin/node-segment |
|
Established |
| 13 |
messense/jieba-rs
The Jieba Chinese Word Segmentation Implemented in Rust |
|
Established |
| 14 |
jiaeyan/Jiayan
甲言,专注于古代汉语(古汉语/古文/文言文/文言)处理的NLP工具包,支持文言词库构建、分词、词性标注、断句和标点。Jiayan, the 1st... |
|
Established |
| 15 |
monpa-team/monpa
MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型 |
|
Emerging |
| 16 |
OpenPecha/pybo
🦜 NLP for Tibetan, in Python. |
|
Emerging |
| 17 |
lionsoul2014/jcseg
Jcseg is a light weight NLP framework developed with Java. Provide CJK and... |
|
Emerging |
| 18 |
suminb/hanja
한글, 한자 라이브러리 |
|
Emerging |
| 19 |
yaoguangluo/Deta_Parser
快速中文分词分析word segmentation |
|
Emerging |
| 20 |
hankcs/hanlp-lucene-plugin
HanLP中文分词Lucene插件,支持包括Solr在内的基于Lucene的系统 |
|
Emerging |
| 21 |
XiaoMi/MiNLP
XiaoMi Natural Language Processing Toolkits |
|
Emerging |
| 22 |
qinwf/jiebaR
Chinese text segmentation with R. R语言中文分词 (文档已更新 🎉... |
|
Emerging |
| 23 |
notoriouslab/trad-zh-search
trad-zh-search 可單獨搭配主流搜尋引擎,專門給繁體中文使用的繁體中文文本預處理工具 —— CKIP 分詞 + bigram 索引生成,附可選擇的領域字典系統 |
|
Emerging |
| 24 |
hankcs/multi-criteria-cws
Simple Solution for Multi-Criteria Chinese Word Segmentation |
|
Emerging |
| 25 |
jimichan/mynlp
一个生产级、高性能、模块化、可扩展的中文NLP工具包。(中文分词、平均感知机、fastText、拼音、新词发现、分词纠错、BM25、人名识别、命名实体、自定义词典) |
|
Emerging |
| 26 |
smoothnlp/SmoothNLP
专注于可解释的NLP技术 An NLP Toolset With A Focus on Explainable Inference |
|
Emerging |
| 27 |
houbb/opencc4j
🇨🇳Open Chinese Convert is an opensource project for conversion between... |
|
Emerging |
| 28 |
KoichiYasuoka/UD-Kanbun
Tokenizer POS-tagger and Dependency-parser for Classical Chinese |
|
Emerging |
| 29 |
supercoderhawk/DeepLearning_NLP
基于深度学习的自然语言处理库 |
|
Emerging |
| 30 |
hankcs/ID-CNN-CWS
Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese... |
|
Emerging |
| 31 |
notAI-tech/deepsegment
A sentence segmenter that actually works! |
|
Emerging |
| 32 |
kirklin/go-swd
Sensitive Words Detection 一个高性能的敏感词检测和过滤库,基于 Go... |
|
Emerging |
| 33 |
houbb/segment
The jieba-analysis tool for java.(基于结巴分词词库实现的更加灵活优雅易用,高性能的 java 分词实现。支持词性标注。) |
|
Emerging |
| 34 |
StarCC0/starcc-py
简繁转换 簡繁轉換 Python implementation of StarCC, the next generation of... |
|
Emerging |
| 35 |
houbb/pinyin
The high performance pinyin tool for java.(java 高性能中文转拼音工具。支持同音字。) |
|
Emerging |
| 36 |
houbb/nlp-hanzi-similar
The hanzi similar tool.(汉字相似度计算工具,中文形近字算法。可用于手写汉字识别纠正,文本混淆等。) |
|
Emerging |
| 37 |
junchaoIU/QCNLP
A Preprocessing & Parsing tool for Chinese Natural Language (一个高效的中文预处理与自然语言处理解析工具) |
|
Emerging |
| 38 |
mxcoras/jieba-next
Use Rust to Speed up jieba 高效、现代的中文分词库 |
|
Emerging |
| 39 |
google/budou
Budou is an automatic organizer tool for beautiful line breaking in CJK... |
|
Emerging |
| 40 |
KoichiYasuoka/SuPar-Kanbun
Tokenizer POS-tagger and Dependency-parser for Classical Chinese |
|
Emerging |
| 41 |
cyd622/nlp-jieba
结巴中文分词(PHP 版本):做最好的 PHP 中文分词、中文断词组件 |
|
Emerging |
| 42 |
KoichiYasuoka/GuwenCOMBO
Tokenizer POS-tagger and Dependency-parser for Classical Chinese |
|
Emerging |
| 43 |
shibing624/crf-seg
crf-seg:用于生产环境的中文分词处理工具,可自定义语料、可自定义模型、架构清晰,分词效果好。java编写。 |
|
Emerging |
| 44 |
bububa/jiagu
Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类 |
|
Emerging |
| 45 |
wittawatj/jtcc
Java library to tokenize Thai text into a list of TCCs |
|
Emerging |
| 46 |
KoichiYasuoka/SuPar-Kanbun-1.3.4
Tokenizer POS-tagger and Dependency-parser for Classical Chinese |
|
Experimental |
| 47 |
sdq/FenciMac
中文分词 Mac版 |
|
Experimental |
| 48 |
supercoderhawk/DeepNLP
基于深度学习的自然语言处理库 |
|
Experimental |
| 49 |
jamsinclair/budou-node
Node.js port of Budou, an automatic organizer tool for beautiful line... |
|
Experimental |
| 50 |
jason2506/esapp
An unsupervised Chinese word segmentation tool. |
|
Experimental |
| 51 |
zxgineng/deepnlp
小时候练手的nlp项目 |
|
Experimental |
| 52 |
dogterbox/thai-word-segmentation
Thai word segmentation using deep learning |
|
Experimental |
| 53 |
PyThaiNLP/Han-solo
🪿 Han-solo: Thai syllable segmenter |
|
Experimental |
| 54 |
bryanchw/Traditional-Chinese-Stopwords-and-Punctuations-Library
Created a Python library specifically for Traditional Chinese stopwords and... |
|
Experimental |
| 55 |
hope-data-science/chinese_NLP
中文自然语言处理 |
|
Experimental |
| 56 |
limchiahooi/nlp-chinese
This repo contains my Natural Language Processing (NLP) in Chinese project. |
|
Experimental |
| 57 |
mathsyouth/awesome-word-segmentation
A curated list of resources dedicated to word segmentation |
|
Experimental |
| 58 |
jsrpy/Chinese-NLP-Jieba
This is an introduction to Chinese words segmentation using Jieba. |
|
Experimental |
| 59 |
cxumol/jieba-wasm-html
Fast Jieba Chinese text segmentation on browser without backend/NPM |... |
|
Experimental |
| 60 |
shibing624/pinyin-tokenizer
pinyintokenizer, 拼音分词器,将连续的拼音切分为单字拼音列表。 |
|
Experimental |
| 61 |
wchan757/Cantonese_Word_Segmentation
Dictionary for Cantonese word segmentation |
|
Experimental |
| 62 |
Lapis-Hong/fast-xinci
新词发现 Chinese New Words Finder (c++ library). |
|
Experimental |
| 63 |
gyatso736/-Tibetan-tokenizer-
This Tibetan tokenizer based on Bi-LSTM+CRF methods, it was created with the... |
|
Experimental |
| 64 |
JackHCC/Chinese-Tokenization
利用传统方法(N-gram,HMM等)、神经网络方法(CNN,LSTM等)和预训练方法(Bert等)的中文分词任务实现【The word... |
|
Experimental |
| 65 |
NoHeartPen/Kanji2Hanzi
This Project is used to convert Japanese Kanji to Simplifed Chinese character. |
|
Experimental |
| 66 |
NicoACloutier/Hanzi.jl
A Julia library to romanize Hanzi. |
|
Experimental |
| 67 |
Jyutt/jieba-hs
Jieba中文分詞算法Haskell版本 Haskell Implementation of Jieba Chinese Segmentation Algorithm |
|
Experimental |
| 68 |
yihong-chen/chinese-word-segmentation
Simple chinese word segmentation with experiments on the PKU datatset |
|
Experimental |
| 69 |
wittawatj/ctwt
Classifier-based Thai Word Tokenizer |
|
Experimental |
| 70 |
sinostudy/pinyin
Convert between different representations of Hànyǔ Pīnyīn. |
|
Experimental |
| 71 |
bmwj/Tibetan_information_processing
藏文信息处理工具集(Tibetan_Information_Processing_Toolkit),其功能包含:生成完整的藏文字符集,智能识别藏文字符构件... |
|
Experimental |
| 72 |
Ancastal/HSK-Character-Profiler
HSK Character Profiler is a Python tool that analyzes Chinese character... |
|
Experimental |
| 73 |
StarCC0/starcc0.github.io
简繁转换 簡繁轉換 StarCC is the next generation of Simplified-Traditional Chinese... |
|
Experimental |