lovit/soynlp
한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다.
Based on the README, here's a technical summary: Employs unsupervised statistical approaches using cohesion scores, branching entropy, and accessor variety to extract word boundaries and identify nouns directly from corpus patterns without labeled training data. Provides multiple tokenization strategies (LTokenizer for left-right morpheme decomposition, MaxScoreTokenizer, RegexTokenizer) and includes a Point-wise Mutual Information module for analyzing word co-occurrence patterns, integrating with complementary libraries like soyspacing and KR-WordRank for spacing correction and keyword extraction.
984 stars and 122,443 monthly downloads. Available on PyPI.
Stars
984
Forks
184
Language
Python
License
—
Category
Last pushed
Mar 10, 2026
Monthly downloads
122,443
Commits (30d)
0
Dependencies
4
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/lovit/soynlp"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.