polm/fugashi
A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.
Provides pluggable dictionary support through named tuple feature wrappers, enabling use with arbitrary MeCab dictionaries beyond UniDic, and bundles pre-built wheels for Linux, macOS, and Windows to eliminate native compilation friction. The architecture exposes morphological analysis results as structured Python objects (surface form, POS tags, lemmas) rather than raw tuples, while supporting both streaming parse operations and batch tokenization modes.
515 stars.
Stars
515
Forks
39
Language
C++
License
MIT
Category
Last pushed
Oct 24, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/polm/fugashi"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
EmilStenstrom/conllu
A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.
OpenPecha/Botok
🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python
taishi-i/nagisa
A Japanese tokenizer based on recurrent neural networks
zaemyung/sentsplit
A flexible sentence segmentation library using CRF model and regex rules
natasha/razdel
Rule-based token, sentence segmentation for Russian language