ku-nlp/jumanpp-jumandic
Scripts for training Jumandic Juman++ model
This tool helps Japanese natural language processing developers build a custom Juman++ model tailored for the Jumandic dictionary. You provide text corpora and dictionary entries, and it generates a ready-to-use Juman++ model. This is for developers or NLP engineers working on applications that require precise Japanese morphological analysis and text parsing.
No commits in the last 6 months.
Use this if you need to create a specialized Juman++ morphological analyzer with custom vocabulary for Japanese text processing applications.
Not ideal if you're an end-user looking for a pre-trained, ready-to-use Japanese NLP tool without custom model training.
Stars
7
Forks
—
Language
Makefile
License
—
Category
Last pushed
Sep 15, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/ku-nlp/jumanpp-jumandic"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
EmilStenstrom/conllu
A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.
OpenPecha/Botok
🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python
taishi-i/nagisa
A Japanese tokenizer based on recurrent neural networks
zaemyung/sentsplit
A flexible sentence segmentation library using CRF model and regex rules
natasha/razdel
Rule-based token, sentence segmentation for Russian language