Southeast Asian NLP Tools

NLP tools and resources specifically for Southeast Asian languages (Khmer, Burmese, Myanmar, Thai, Rakhine). Includes text segmentation, transliteration, OCR, and language-specific preprocessing. Does NOT include general multilingual NLP tools, datasets for non-Southeast Asian languages, or language-agnostic NLP frameworks.

There are 40 southeast asian nlp tools tracked. 3 score above 50 (established tier). The highest-rated is PyThaiNLP/attacut at 55/100 with 94 stars and 4,237 monthly downloads.

Get all 40 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=southeast-asian-nlp-tools&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 PyThaiNLP/attacut

A Fast and Accurate Neural Thai Word Segmenter

55
Established
2 UlugbekSalaev/UzTransliterator

UzTransliterator | State-of-the-art machine transliteration tool for Uzbek language

54
Established
3 VietHoang1512/khmer-nltk

Khmer language processing toolkit

50
Established
4 seanghay/KhmerOCR

A Fast Khmer Optical Character Recognition (KhmerOCR)

43
Emerging
5 seanghay/khmernormalizer

A missing toolkit for Khmer Natural Language Processing.

41
Emerging
6 ionite34/Aquila-Resolve

Augmented Recurrent Neural Grapheme-to-Phoneme conversion with Inflectional...

41
Emerging
7 seanghay/khmerphonemizer

A Free, Standalone and Open-Source Khmer Grapheme-to-Phonemes.

38
Emerging
8 Sovichea/khmer_segmenter

A zero-dependency, high-performance Khmer word segmenter using the Viterbi...

37
Emerging
9 mdoumbouya/detransliterator

detransliteration library and tools

36
Emerging
10 AI4Bharat/IndicNLP-Transliteration

Codebase for Indic-Transliteration using Seq2Seq RNN. For latest repo with...

35
Emerging
11 eimg/myanmar-text-breaker

Syllable and word, breaker/boundary-segmentation for Myanmar text in JavaScript

33
Emerging
12 koomri/text-segmentation

Implementation of the paper: Text Segmentation as a Supervised Learning Task

33
Emerging
13 ionite34/h2p-parser

Heteronym to Phoneme Parser

30
Emerging
14 YerevaNN/translit-rnn

Automatic transliteration with LSTM

29
Experimental
15 MinSiThu/Rakhine-Proverbs-Dataset

Proverbs in Rakhine/Arakan Language

27
Experimental
16 Khmer-NLP/khmer-nlp

Khmer Natural Language Processing (KHNLP)

27
Experimental
17 Koziev/StressModel

Neural model for prediction of stress position in Russian words

27
Experimental
18 josephjojoe/syllabification

GRU-based neural network with Inception modules and an optional Linear Chain...

26
Experimental
19 khmerlang/elasticsearch-analysis-khmerlang

Khmer Analysis Plugin for Elasticsearch

25
Experimental
20 swanhtet1992/ReSegment

Burmese (Myanmar) syllable level segmentation with regex.

25
Experimental
21 netra-ai-lab/Khmer-OCR-CNN-Transformer

A Squeeze-and-Excitation Transformer Network for Khmer Optical Character Recognition

25
Experimental
22 SaPhyoThuHtet/myanmar-nlp-tool

Natural Language Processing Tool

24
Experimental
23 seanghay/khmer-neural-segmenter

Khmer Neural Segmenter

24
Experimental
24 Koziev/transcriber

Model to convert text to phonetic transcription and vice versa

24
Experimental
25 sagorbrur/itranslit

transliteration for indic language

24
Experimental
26 chanmratekoko/Awesome-Myanmar-Wordlists-Dictionary-Collection

Myanmar (Burmese) Wordlists Dictionary Collection for word segmentation,...

23
Experimental
27 NDarayut/english-khmer-transliteration

An English–Khmer transliteration system built on an Attention-Based...

22
Experimental
28 sagorbrur/bntranslit

Bangla Transliteration Package

22
Experimental
29 alvations/myth

Myanmar and Thai Language Resources

19
Experimental
30 Michael95-m/myanmar_names

Burmese name conversion with rule-based method (Burmese to English and...

16
Experimental
31 papamusa/Three-word-sentences

🔤 Master three-word sentences for clear English communication through simple...

14
Experimental
32 thomas-chauvet/names_transliteration

Neural Machine Translation (NMT) applied to transliterate names in arabic...

14
Experimental
33 dmitry-rvn/ru-svo-triplets

Subject-verb-object triplets extraction for russian language.

14
Experimental
34 ye-kyaw-thu/MSL4Emergency

Myanmar Sign Language Corpus for Emergency Domain

13
Experimental
35 Socret360/joint-khmer-word-segmentation-and-pos-tagging

A Keras implementation of a deep learning network to simultaneously perform...

13
Experimental
36 shayneobrien/text-segmentation

Neural and nonneural text segmentation methods.

13
Experimental
37 suralmasha/RuTranscript

Russian phonetical transcription

12
Experimental
38 SaPhyoThuHtet/myanmar-part-of-speech-tagging-based-on-machine-translation

POS Tagging Based on Machine Translation (UTYCC Class Final Project)

11
Experimental
39 ThuraAung1601/myTypo

myTypo : Typographic Error Simulator for Myanmar Language

10
Experimental
40 eemberda/Cebuano-Syllable-Decoder

Accepts a Cebuano word and breaks it down into syllables

10
Experimental

Comparisons in this category