lovit/soynlp

한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다.

/ 100

Verified

Based on the README, here's a technical summary: Employs unsupervised statistical approaches using cohesion scores, branching entropy, and accessor variety to extract word boundaries and identify nouns directly from corpus patterns without labeled training data. Provides multiple tokenization strategies (LTokenizer for left-right morpheme decomposition, MaxScoreTokenizer, RegexTokenizer) and includes a Point-wise Mutual Information module for analyzing word co-occurrence patterns, integrating with complementary libraries like soyspacing and KR-WordRank for spacing correction and keyword extraction.

984 stars and 122,443 monthly downloads. Available on PyPI.

Maintenance 13 / 25

Adoption 20 / 25

Maturity 25 / 25

Community 24 / 25

How are scores calculated?

Stars

984

Forks

184

Language

Python

License

—

Related tools

bab2min/kiwipiepy

Python API for Kiwi

hyunwoongko/kss

KSS: Korean String processing Suite

bab2min/Kiwi

Kiwi(지능형 한국어 형태소 분석기)

JDongian/python-jamo

Hangul syllable decomposition and synthesis using jamo.

shineware/KOMORAN

Korean Morphological Analyzer by shineware

Explore NLP Tools

All categories Trending NLP directory Insights