lovit/soynlp

한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다.

82
/ 100
Verified

Based on the README, here's a technical summary: Employs unsupervised statistical approaches using cohesion scores, branching entropy, and accessor variety to extract word boundaries and identify nouns directly from corpus patterns without labeled training data. Provides multiple tokenization strategies (LTokenizer for left-right morpheme decomposition, MaxScoreTokenizer, RegexTokenizer) and includes a Point-wise Mutual Information module for analyzing word co-occurrence patterns, integrating with complementary libraries like soyspacing and KR-WordRank for spacing correction and keyword extraction.

984 stars and 122,443 monthly downloads. Available on PyPI.

Maintenance 13 / 25
Adoption 20 / 25
Maturity 25 / 25
Community 24 / 25

How are scores calculated?

Stars

984

Forks

184

Language

Python

License

Last pushed

Mar 10, 2026

Monthly downloads

122,443

Commits (30d)

0

Dependencies

4

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/lovit/soynlp"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.