Japanese Text Processing NLP Tools

Tools for Japanese-specific morphological analysis, text normalization, kana-kanji conversion, and character processing. Does NOT include general multilingual NLP, machine translation systems, or language learning applications (unless text processing is the primary focus).

There are 96 japanese text processing tools tracked. 1 score above 70 (verified tier). The highest-rated is EmilStenstrom/conllu at 76/100 with 320 stars and 473,236 monthly downloads.

Get all 96 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=japanese-text-processing&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 EmilStenstrom/conllu

A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a...

76
Verified
2 OpenPecha/Botok

🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python

68
Established
3 taishi-i/nagisa

A Japanese tokenizer based on recurrent neural networks

63
Established
4 zaemyung/sentsplit

A flexible sentence segmentation library using CRF model and regex rules

60
Established
5 natasha/razdel

Rule-based token, sentence segmentation for Russian language

59
Established
6 polm/cutlet

Japanese to romaji converter in Python

58
Established
7 ku-nlp/rhoknp

Yet another Python binding for Juman++/KNP/KWJA

56
Established
8 azooKey/AzooKeyKanaKanjiConverter

Kana-Kanji Conversion Module written in Swift, supporting Neural Kana-Kanji...

49
Emerging
9 textlint-rule/sentence-splitter

Split {Japanese, English} text into sentences.

47
Emerging
10 PKSHATechnology-Research/tdmelodic

A Japanese accent dictionary generator

46
Emerging
11 javierarce/silabea

Node package that split Spanish words into syllables.

45
Emerging
12 himkt/konoha

🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to...

44
Emerging
13 rabbit19981023/yomigana-ebook

The fastest converter to add furigana(readings) to Japanese epub eBooks

42
Emerging
14 ikegami-yukino/mozcpy

Kana-Kanji converter using Mozc dictionary

42
Emerging
15 andreihar/taibun

Taiwanese Hokkien Transliterator and Tokeniser

42
Emerging
16 LibreTranslate/MiniSBD

Free and open source library for fast sentence boundary detection

41
Emerging
17 gold-silver-copper/english

World's most accurate and fast procedural English conjugation library

41
Emerging
18 togatoga/karukan

Japanese Input Method System for Linux, Neural Kana-Kanji Conversion Engine...

41
Emerging
19 akaza-im/akaza

Yet another Japanese IME for IBus/Linux

41
Emerging
20 mkartawijaya/dango

An easy to use tokenizer for Japanese text, aimed at language learners and...

40
Emerging
21 wwwcojp/ja_sentence_segmenter

japanese sentence segmentation library for python

40
Emerging
22 polm/fugashi

A Cython MeCab wrapper for fast, pythonic Japanese tokenization and...

40
Emerging
23 KoichiYasuoka/SuPar-UniDic

Tokenizer POS-tagger Lemmatizer and Dependency-parser for modern and...

40
Emerging
24 kevincobain2000/jProcessing

Japanese Natural Langauge Processing Libraries

39
Emerging
25 azu/morpheme-match

match function that match token(形態素解析) with sentence.

39
Emerging
26 fnl/syntok

Text tokenization and sentence segmentation (segtok v2)

38
Emerging
27 SOMJANG/Mecab-ko-for-Google-Colab

Use Mecab Library(NLP Library) in Google Colab

38
Emerging
28 miurahr/pykakasi

Lightweight converter from Japanese Kana-kanji sentences into Kana-Roman.

37
Emerging
29 ikegami-yukino/neologdn

Japanese text normalizer for mecab-neologd

37
Emerging
30 ku-nlp/jumanpp

Juman++ (a Morphological Analyzer Toolkit)

37
Emerging
31 mediacloud/sentence-splitter

Text to sentence splitter using heuristic algorithm by Philipp Koehn and...

37
Emerging
32 hamanlp/hama

🦛 Hangul Morphological Analyzer

37
Emerging
33 andreihar/taibun.js

Taiwanese Hokkien Transliterator and Tokeniser

36
Emerging
34 Kensuke-Mitsuzawa/JapaneseTokenizers

aim to use JapaneseTokenizer as easy as possible

36
Emerging
35 ikegami-yukino/neologdn-java

Japanese text normalizer for mecab-neologd

36
Emerging
36 craigtrim/fast-sentence-segment

Fast and Efficient Sentence Segmentation

36
Emerging
37 koshort/pyeunjeon

(deprecated) 은전한닢 프로젝트와 mecab 기반의 한국어 형태소 분석기의 독립형 python 인터페이스

35
Emerging
38 LanguageMachines/mbt

MBT: Memory-based tagger generation and tagging MBT is a memory-based...

34
Emerging
39 alinear-corp/kuzukiri

Japanese Text Segmenter for Python written in Rust

34
Emerging
40 ABTdomain/dksplit

DKSplit — fast word segmentation for Python. Split domain names and...

34
Emerging
41 neelguha/legal-segmenter

A simple library for segmenting legal texts

33
Emerging
42 loomchild/segment

Program used to split text into segments

33
Emerging
43 thammin/juman-bin

a User-Extensible Morphological Analyzer for Japanese. 日本語形態素解析システム

33
Emerging
44 mkpoli/ainconv

A JavaScript package to convert between Ainu writing systems

31
Emerging
45 gpizzorno/conllu_tools

A Python toolkit for working with CoNLL-U files, Universal Dependencies...

30
Emerging
46 LR-POR/cl-conllu

tool for working with conllu files in CL

29
Experimental
47 medspacy/sectionizer

A rule-based Python module for spitting documents into sections.

29
Experimental
48 typedgrammar/typed-japanese

🌸 Learn Japanese grammar with TypeScript

29
Experimental
49 ejossev/hypherator-java

Java Hyphenation Iterator

28
Experimental
50 hephaex/mecab-ko

MeCab-Ko: Rust로 구현된 한국어 형태소 분석기. 세종 코퍼스 호환 97% 정확도.

27
Experimental
51 ku-nlp/knp

A Japanese Parser

27
Experimental
52 tokuhirom/jawiki-kana-kanji-dict

Generate SKK/MeCab dictionary from Wikipedia(Japanese edition)

27
Experimental
53 yoshoku/suika

Suika 🍉 is a Japanese morphological analyzer written in pure Ruby

26
Experimental
54 cronokirby/ginkou

Japanese sentence bank program. Add and find sentences for language learning.

26
Experimental
55 retarfi/jptranstokenizer

Japanese Tokenizer for transformers library

25
Experimental
56 junhewk/RcppMeCab

RcppMeCab: Rcpp Interface of CJK Morpheme Analyzer MeCab

25
Experimental
57 tasukuigarashi/j-liwc2015

Japanese version of LIWC2015

25
Experimental
58 luxiant/sentence_segmentation

A rule-based sentence_segmenter, inspired by ruby pragmatic segmenter by...

25
Experimental
59 taipalogy/taipa

台灣語形態素解析(Taiwanese morphological parsing)

24
Experimental
60 ArthurDevNL/CoNLL-U

A lightweight NuGet package for parsing CoNLL-U files in C#

24
Experimental
61 uribo/sudachir

R Interface to 'Sudachi'

23
Experimental
62 agatan/yoin

A Japanese Morphological Analyzer written in pure Rust

23
Experimental
63 KOLANICH-libs/WordSplitAbs.py

An abstraction layer around word splitters for python

22
Experimental
64 NonJishoKei/NonJishoKei

[WIP] This is a lightweight morphological analyzer designed for Japanese...

22
Experimental
65 azagniotov/solr-lucene-analyzer-sudachi

A Japanese morphological analyzer Sudachi as a Solr plugin.

22
Experimental
66 MiguelNecoechea/Complexa

Yet another Chrome extension for learning Japanese

21
Experimental
67 apakabarlabs/syllabreak-swift

Multilingual library for accurate and deterministic hyphenation and syllable...

21
Experimental
68 junhewk/RmecabKo

RmecabKo: R wrapper for eunjeon project (mecab-ko)

21
Experimental
69 rmalouf/treesearch

High-performance toolkit for querying linguistic dependency parses

21
Experimental
70 yongsk0066/corevoikko

Finnish spell checker, morphological analyzer, and grammar checker — Rust +...

20
Experimental
71 jeffhuen/plurality

Fast English plural and singular noun inflection for Elixir. Convert plural...

19
Experimental
72 tchin25/japanese-dependency-visualizer

A dependency visualizer for Japanese to help beginners deconstruct complex...

18
Experimental
73 hppRC/jawiki-cleaner

🧹Japanese Wikipedia Cleaner 🧹

18
Experimental
74 cryshin22/Cutlet-Japan

Japanese to romaji converter in Python

17
Experimental
75 mkpoli/ainu-wiktionary

アイヌ語Wiktionary入力補助ツール

15
Experimental
76 atsumari-io/mecab-service

Web app for tokenizing Japanese text using MeCab

14
Experimental
77 GINK03/boosting-tree-tokenizer

Gradient Boosting Dicision...

14
Experimental
78 btrkeks/jp-deinflector

A high-performance Rust crate for deinflecting Japanese words using perfect...

14
Experimental
79 megagonlabs/desuwa

Feature annotator to morphemes and phrases based on KNP rule files (pure-Python)

13
Experimental
80 rakutentech/pisah

Sentence Splitter Library (C++ port of pySBD)

13
Experimental
81 tetutaro/mecab_dictionaries

create various dictionaries for MeCab and MeCab CLI using fugashi

13
Experimental
82 BrambleXu/jp-stopword-filter

A lightweight Python library designed to filter stopwords from Japanese text...

12
Experimental
83 cronokirby/nicer-mecab

Japanese morphological analysis. Wrapper over mecab.

12
Experimental
84 akiomik/vibrato-dict-ipa-neologd

A compiled mecab-ipadic-neologd dictionary for vibrato

12
Experimental
85 whelk-io/hy-phen-a-tion

Java OSS library for calculating syllables and hyphenation based on Frank...

12
Experimental
86 Shusei-E/RcppJagger

RcppJagger is a wrapper package for Jagger

12
Experimental
87 proycon/hyphertool

Command-line tool for syllabification and hyphenisation for multiple languages

12
Experimental
88 bureaucratic-labs/conllu

CoNLL-U format parser

12
Experimental
89 ru-ka/syllable-divider

A WebAssembly library for syllable division in XML/DOM trees

11
Experimental
90 apakabarlabs/syllabreak-kotlin

Kotlin library for multilingual syllabification and hyphenation

11
Experimental
91 QuyAnh2005/StyleTTS-VC-Japanese

StyleTTS Voice Conversions for Japanese

11
Experimental
92 evamaxfield/cue-queue

Transcript segmentation using the average semantic encodings of cue sentences.

11
Experimental
93 milovatjp/hazuki

Japanese complexity analysis app within JLPT framework.

11
Experimental
94 TaygaHoshi/japanese-i-plus-one-filter

Finds i+1 sentences for a specific word from Jisho.org.

10
Experimental
95 d108/Samazama

Save keystrokes for iOS and macOS users by comparing shorthand input against...

10
Experimental
96 luckasRanarison/kaiseki

A japanese tokenizer and morphological analyzer

10
Experimental