NLPchina/ansj_seg

ansj分词.ict的真正java实现.分词效果速度都超过开源版的ict. 中文分词,人名识别,词性标注,用户自定义词典

/ 100

Established

Built on n-Gram+CRF+HMM algorithms, Ansj achieves ~2 million characters per second throughput with 96%+ accuracy for Chinese tokenization. Beyond segmentation, it provides named entity recognition for Chinese names, keyword extraction, automatic summarization, and pattern-based recognition modules (ID cards, timestamps, URLs) with extensible user-defined dictionaries. Available as a Maven dependency targeting Java NLP pipelines that require production-grade Chinese text processing.

6,544 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 25 / 25

How are scores calculated?

Stars

6,544

Forks

2,291

Language

Java

License

Apache-2.0

Related tools

PyThaiNLP/pythainlp

Thai natural language processing in Python

100

hankcs/HanLP

Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named...

jacksonllee/pycantonese

Cantonese Linguistics and NLP

dongrixinyu/JioNLP

中文 NLP 预处理、解析工具包，准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com

hankcs/pyhanlp

中文分词

Explore NLP Tools

All categories Trending NLP directory Insights