NLPchina/ansj_seg

ansj分词.ict的真正java实现.分词效果速度都超过开源版的ict. 中文分词,人名识别,词性标注,用户自定义词典

51
/ 100
Established

Built on n-Gram+CRF+HMM algorithms, Ansj achieves ~2 million characters per second throughput with 96%+ accuracy for Chinese tokenization. Beyond segmentation, it provides named entity recognition for Chinese names, keyword extraction, automatic summarization, and pattern-based recognition modules (ID cards, timestamps, URLs) with extensible user-defined dictionaries. Available as a Maven dependency targeting Java NLP pipelines that require production-grade Chinese text processing.

6,544 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 25 / 25

How are scores calculated?

Stars

6,544

Forks

2,291

Language

Java

License

Apache-2.0

Last pushed

Nov 19, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/NLPchina/ansj_seg"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.