NLPchina/ansj_seg
ansj分词.ict的真正java实现.分词效果速度都超过开源版的ict. 中文分词,人名识别,词性标注,用户自定义词典
Built on n-Gram+CRF+HMM algorithms, Ansj achieves ~2 million characters per second throughput with 96%+ accuracy for Chinese tokenization. Beyond segmentation, it provides named entity recognition for Chinese names, keyword extraction, automatic summarization, and pattern-based recognition modules (ID cards, timestamps, URLs) with extensible user-defined dictionaries. Available as a Maven dependency targeting Java NLP pipelines that require production-grade Chinese text processing.
6,544 stars. No commits in the last 6 months.
Stars
6,544
Forks
2,291
Language
Java
License
Apache-2.0
Category
Last pushed
Nov 19, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/NLPchina/ansj_seg"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
PyThaiNLP/pythainlp
Thai natural language processing in Python
hankcs/HanLP
Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named...
jacksonllee/pycantonese
Cantonese Linguistics and NLP
dongrixinyu/JioNLP
中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com
hankcs/pyhanlp
中文分词