Japanese Text Processing NLP Tools
Tools for Japanese-specific morphological analysis, text normalization, kana-kanji conversion, and character processing. Does NOT include general multilingual NLP, machine translation systems, or language learning applications (unless text processing is the primary focus).
There are 96 japanese text processing tools tracked. 1 score above 70 (verified tier). The highest-rated is EmilStenstrom/conllu at 76/100 with 320 stars and 473,236 monthly downloads.
Get all 96 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=japanese-text-processing&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
EmilStenstrom/conllu
A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a... |
|
Verified |
| 2 |
OpenPecha/Botok
🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python |
|
Established |
| 3 |
taishi-i/nagisa
A Japanese tokenizer based on recurrent neural networks |
|
Established |
| 4 |
zaemyung/sentsplit
A flexible sentence segmentation library using CRF model and regex rules |
|
Established |
| 5 |
natasha/razdel
Rule-based token, sentence segmentation for Russian language |
|
Established |
| 6 |
polm/cutlet
Japanese to romaji converter in Python |
|
Established |
| 7 |
ku-nlp/rhoknp
Yet another Python binding for Juman++/KNP/KWJA |
|
Established |
| 8 |
azooKey/AzooKeyKanaKanjiConverter
Kana-Kanji Conversion Module written in Swift, supporting Neural Kana-Kanji... |
|
Emerging |
| 9 |
textlint-rule/sentence-splitter
Split {Japanese, English} text into sentences. |
|
Emerging |
| 10 |
PKSHATechnology-Research/tdmelodic
A Japanese accent dictionary generator |
|
Emerging |
| 11 |
javierarce/silabea
Node package that split Spanish words into syllables. |
|
Emerging |
| 12 |
himkt/konoha
🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to... |
|
Emerging |
| 13 |
rabbit19981023/yomigana-ebook
The fastest converter to add furigana(readings) to Japanese epub eBooks |
|
Emerging |
| 14 |
ikegami-yukino/mozcpy
Kana-Kanji converter using Mozc dictionary |
|
Emerging |
| 15 |
andreihar/taibun
Taiwanese Hokkien Transliterator and Tokeniser |
|
Emerging |
| 16 |
LibreTranslate/MiniSBD
Free and open source library for fast sentence boundary detection |
|
Emerging |
| 17 |
gold-silver-copper/english
World's most accurate and fast procedural English conjugation library |
|
Emerging |
| 18 |
togatoga/karukan
Japanese Input Method System for Linux, Neural Kana-Kanji Conversion Engine... |
|
Emerging |
| 19 |
akaza-im/akaza
Yet another Japanese IME for IBus/Linux |
|
Emerging |
| 20 |
mkartawijaya/dango
An easy to use tokenizer for Japanese text, aimed at language learners and... |
|
Emerging |
| 21 |
wwwcojp/ja_sentence_segmenter
japanese sentence segmentation library for python |
|
Emerging |
| 22 |
polm/fugashi
A Cython MeCab wrapper for fast, pythonic Japanese tokenization and... |
|
Emerging |
| 23 |
KoichiYasuoka/SuPar-UniDic
Tokenizer POS-tagger Lemmatizer and Dependency-parser for modern and... |
|
Emerging |
| 24 |
kevincobain2000/jProcessing
Japanese Natural Langauge Processing Libraries |
|
Emerging |
| 25 |
azu/morpheme-match
match function that match token(形態素解析) with sentence. |
|
Emerging |
| 26 |
fnl/syntok
Text tokenization and sentence segmentation (segtok v2) |
|
Emerging |
| 27 |
SOMJANG/Mecab-ko-for-Google-Colab
Use Mecab Library(NLP Library) in Google Colab |
|
Emerging |
| 28 |
miurahr/pykakasi
Lightweight converter from Japanese Kana-kanji sentences into Kana-Roman. |
|
Emerging |
| 29 |
ikegami-yukino/neologdn
Japanese text normalizer for mecab-neologd |
|
Emerging |
| 30 |
ku-nlp/jumanpp
Juman++ (a Morphological Analyzer Toolkit) |
|
Emerging |
| 31 |
mediacloud/sentence-splitter
Text to sentence splitter using heuristic algorithm by Philipp Koehn and... |
|
Emerging |
| 32 |
hamanlp/hama
🦛 Hangul Morphological Analyzer |
|
Emerging |
| 33 |
andreihar/taibun.js
Taiwanese Hokkien Transliterator and Tokeniser |
|
Emerging |
| 34 |
Kensuke-Mitsuzawa/JapaneseTokenizers
aim to use JapaneseTokenizer as easy as possible |
|
Emerging |
| 35 |
ikegami-yukino/neologdn-java
Japanese text normalizer for mecab-neologd |
|
Emerging |
| 36 |
craigtrim/fast-sentence-segment
Fast and Efficient Sentence Segmentation |
|
Emerging |
| 37 |
koshort/pyeunjeon
(deprecated) 은전한닢 프로젝트와 mecab 기반의 한국어 형태소 분석기의 독립형 python 인터페이스 |
|
Emerging |
| 38 |
LanguageMachines/mbt
MBT: Memory-based tagger generation and tagging MBT is a memory-based... |
|
Emerging |
| 39 |
alinear-corp/kuzukiri
Japanese Text Segmenter for Python written in Rust |
|
Emerging |
| 40 |
ABTdomain/dksplit
DKSplit — fast word segmentation for Python. Split domain names and... |
|
Emerging |
| 41 |
neelguha/legal-segmenter
A simple library for segmenting legal texts |
|
Emerging |
| 42 |
loomchild/segment
Program used to split text into segments |
|
Emerging |
| 43 |
thammin/juman-bin
a User-Extensible Morphological Analyzer for Japanese. 日本語形態素解析システム |
|
Emerging |
| 44 |
mkpoli/ainconv
A JavaScript package to convert between Ainu writing systems |
|
Emerging |
| 45 |
gpizzorno/conllu_tools
A Python toolkit for working with CoNLL-U files, Universal Dependencies... |
|
Emerging |
| 46 |
LR-POR/cl-conllu
tool for working with conllu files in CL |
|
Experimental |
| 47 |
medspacy/sectionizer
A rule-based Python module for spitting documents into sections. |
|
Experimental |
| 48 |
typedgrammar/typed-japanese
🌸 Learn Japanese grammar with TypeScript |
|
Experimental |
| 49 |
ejossev/hypherator-java
Java Hyphenation Iterator |
|
Experimental |
| 50 |
hephaex/mecab-ko
MeCab-Ko: Rust로 구현된 한국어 형태소 분석기. 세종 코퍼스 호환 97% 정확도. |
|
Experimental |
| 51 |
ku-nlp/knp
A Japanese Parser |
|
Experimental |
| 52 |
tokuhirom/jawiki-kana-kanji-dict
Generate SKK/MeCab dictionary from Wikipedia(Japanese edition) |
|
Experimental |
| 53 |
yoshoku/suika
Suika 🍉 is a Japanese morphological analyzer written in pure Ruby |
|
Experimental |
| 54 |
cronokirby/ginkou
Japanese sentence bank program. Add and find sentences for language learning. |
|
Experimental |
| 55 |
retarfi/jptranstokenizer
Japanese Tokenizer for transformers library |
|
Experimental |
| 56 |
junhewk/RcppMeCab
RcppMeCab: Rcpp Interface of CJK Morpheme Analyzer MeCab |
|
Experimental |
| 57 |
tasukuigarashi/j-liwc2015
Japanese version of LIWC2015 |
|
Experimental |
| 58 |
luxiant/sentence_segmentation
A rule-based sentence_segmenter, inspired by ruby pragmatic segmenter by... |
|
Experimental |
| 59 |
taipalogy/taipa
台灣語形態素解析(Taiwanese morphological parsing) |
|
Experimental |
| 60 |
ArthurDevNL/CoNLL-U
A lightweight NuGet package for parsing CoNLL-U files in C# |
|
Experimental |
| 61 |
uribo/sudachir
R Interface to 'Sudachi' |
|
Experimental |
| 62 |
agatan/yoin
A Japanese Morphological Analyzer written in pure Rust |
|
Experimental |
| 63 |
KOLANICH-libs/WordSplitAbs.py
An abstraction layer around word splitters for python |
|
Experimental |
| 64 |
NonJishoKei/NonJishoKei
[WIP] This is a lightweight morphological analyzer designed for Japanese... |
|
Experimental |
| 65 |
azagniotov/solr-lucene-analyzer-sudachi
A Japanese morphological analyzer Sudachi as a Solr plugin. |
|
Experimental |
| 66 |
MiguelNecoechea/Complexa
Yet another Chrome extension for learning Japanese |
|
Experimental |
| 67 |
apakabarlabs/syllabreak-swift
Multilingual library for accurate and deterministic hyphenation and syllable... |
|
Experimental |
| 68 |
junhewk/RmecabKo
RmecabKo: R wrapper for eunjeon project (mecab-ko) |
|
Experimental |
| 69 |
rmalouf/treesearch
High-performance toolkit for querying linguistic dependency parses |
|
Experimental |
| 70 |
yongsk0066/corevoikko
Finnish spell checker, morphological analyzer, and grammar checker — Rust +... |
|
Experimental |
| 71 |
jeffhuen/plurality
Fast English plural and singular noun inflection for Elixir. Convert plural... |
|
Experimental |
| 72 |
tchin25/japanese-dependency-visualizer
A dependency visualizer for Japanese to help beginners deconstruct complex... |
|
Experimental |
| 73 |
hppRC/jawiki-cleaner
🧹Japanese Wikipedia Cleaner 🧹 |
|
Experimental |
| 74 |
cryshin22/Cutlet-Japan
Japanese to romaji converter in Python |
|
Experimental |
| 75 |
mkpoli/ainu-wiktionary
アイヌ語Wiktionary入力補助ツール |
|
Experimental |
| 76 |
atsumari-io/mecab-service
Web app for tokenizing Japanese text using MeCab |
|
Experimental |
| 77 |
GINK03/boosting-tree-tokenizer
Gradient Boosting Dicision... |
|
Experimental |
| 78 |
btrkeks/jp-deinflector
A high-performance Rust crate for deinflecting Japanese words using perfect... |
|
Experimental |
| 79 |
megagonlabs/desuwa
Feature annotator to morphemes and phrases based on KNP rule files (pure-Python) |
|
Experimental |
| 80 |
rakutentech/pisah
Sentence Splitter Library (C++ port of pySBD) |
|
Experimental |
| 81 |
tetutaro/mecab_dictionaries
create various dictionaries for MeCab and MeCab CLI using fugashi |
|
Experimental |
| 82 |
BrambleXu/jp-stopword-filter
A lightweight Python library designed to filter stopwords from Japanese text... |
|
Experimental |
| 83 |
cronokirby/nicer-mecab
Japanese morphological analysis. Wrapper over mecab. |
|
Experimental |
| 84 |
akiomik/vibrato-dict-ipa-neologd
A compiled mecab-ipadic-neologd dictionary for vibrato |
|
Experimental |
| 85 |
whelk-io/hy-phen-a-tion
Java OSS library for calculating syllables and hyphenation based on Frank... |
|
Experimental |
| 86 |
Shusei-E/RcppJagger
RcppJagger is a wrapper package for Jagger |
|
Experimental |
| 87 |
proycon/hyphertool
Command-line tool for syllabification and hyphenisation for multiple languages |
|
Experimental |
| 88 |
bureaucratic-labs/conllu
CoNLL-U format parser |
|
Experimental |
| 89 |
ru-ka/syllable-divider
A WebAssembly library for syllable division in XML/DOM trees |
|
Experimental |
| 90 |
apakabarlabs/syllabreak-kotlin
Kotlin library for multilingual syllabification and hyphenation |
|
Experimental |
| 91 |
QuyAnh2005/StyleTTS-VC-Japanese
StyleTTS Voice Conversions for Japanese |
|
Experimental |
| 92 |
evamaxfield/cue-queue
Transcript segmentation using the average semantic encodings of cue sentences. |
|
Experimental |
| 93 |
milovatjp/hazuki
Japanese complexity analysis app within JLPT framework. |
|
Experimental |
| 94 |
TaygaHoshi/japanese-i-plus-one-filter
Finds i+1 sentences for a specific word from Jisho.org. |
|
Experimental |
| 95 |
d108/Samazama
Save keystrokes for iOS and macOS users by comparing shorthand input against... |
|
Experimental |
| 96 |
luckasRanarison/kaiseki
A japanese tokenizer and morphological analyzer |
|
Experimental |