Southeast Asian NLP Tools
NLP tools and resources specifically for Southeast Asian languages (Khmer, Burmese, Myanmar, Thai, Rakhine). Includes text segmentation, transliteration, OCR, and language-specific preprocessing. Does NOT include general multilingual NLP tools, datasets for non-Southeast Asian languages, or language-agnostic NLP frameworks.
There are 40 southeast asian nlp tools tracked. 3 score above 50 (established tier). The highest-rated is PyThaiNLP/attacut at 55/100 with 94 stars and 4,237 monthly downloads.
Get all 40 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=southeast-asian-nlp-tools&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
PyThaiNLP/attacut
A Fast and Accurate Neural Thai Word Segmenter |
|
Established |
| 2 |
UlugbekSalaev/UzTransliterator
UzTransliterator | State-of-the-art machine transliteration tool for Uzbek language |
|
Established |
| 3 |
VietHoang1512/khmer-nltk
Khmer language processing toolkit |
|
Established |
| 4 |
seanghay/KhmerOCR
A Fast Khmer Optical Character Recognition (KhmerOCR) |
|
Emerging |
| 5 |
seanghay/khmernormalizer
A missing toolkit for Khmer Natural Language Processing. |
|
Emerging |
| 6 |
ionite34/Aquila-Resolve
Augmented Recurrent Neural Grapheme-to-Phoneme conversion with Inflectional... |
|
Emerging |
| 7 |
seanghay/khmerphonemizer
A Free, Standalone and Open-Source Khmer Grapheme-to-Phonemes. |
|
Emerging |
| 8 |
Sovichea/khmer_segmenter
A zero-dependency, high-performance Khmer word segmenter using the Viterbi... |
|
Emerging |
| 9 |
mdoumbouya/detransliterator
detransliteration library and tools |
|
Emerging |
| 10 |
AI4Bharat/IndicNLP-Transliteration
Codebase for Indic-Transliteration using Seq2Seq RNN. For latest repo with... |
|
Emerging |
| 11 |
eimg/myanmar-text-breaker
Syllable and word, breaker/boundary-segmentation for Myanmar text in JavaScript |
|
Emerging |
| 12 |
koomri/text-segmentation
Implementation of the paper: Text Segmentation as a Supervised Learning Task |
|
Emerging |
| 13 |
ionite34/h2p-parser
Heteronym to Phoneme Parser |
|
Emerging |
| 14 |
YerevaNN/translit-rnn
Automatic transliteration with LSTM |
|
Experimental |
| 15 |
MinSiThu/Rakhine-Proverbs-Dataset
Proverbs in Rakhine/Arakan Language |
|
Experimental |
| 16 |
Khmer-NLP/khmer-nlp
Khmer Natural Language Processing (KHNLP) |
|
Experimental |
| 17 |
Koziev/StressModel
Neural model for prediction of stress position in Russian words |
|
Experimental |
| 18 |
josephjojoe/syllabification
GRU-based neural network with Inception modules and an optional Linear Chain... |
|
Experimental |
| 19 |
khmerlang/elasticsearch-analysis-khmerlang
Khmer Analysis Plugin for Elasticsearch |
|
Experimental |
| 20 |
swanhtet1992/ReSegment
Burmese (Myanmar) syllable level segmentation with regex. |
|
Experimental |
| 21 |
netra-ai-lab/Khmer-OCR-CNN-Transformer
A Squeeze-and-Excitation Transformer Network for Khmer Optical Character Recognition |
|
Experimental |
| 22 |
SaPhyoThuHtet/myanmar-nlp-tool
Natural Language Processing Tool |
|
Experimental |
| 23 |
seanghay/khmer-neural-segmenter
Khmer Neural Segmenter |
|
Experimental |
| 24 |
Koziev/transcriber
Model to convert text to phonetic transcription and vice versa |
|
Experimental |
| 25 |
sagorbrur/itranslit
transliteration for indic language |
|
Experimental |
| 26 |
chanmratekoko/Awesome-Myanmar-Wordlists-Dictionary-Collection
Myanmar (Burmese) Wordlists Dictionary Collection for word segmentation,... |
|
Experimental |
| 27 |
NDarayut/english-khmer-transliteration
An English–Khmer transliteration system built on an Attention-Based... |
|
Experimental |
| 28 |
sagorbrur/bntranslit
Bangla Transliteration Package |
|
Experimental |
| 29 |
alvations/myth
Myanmar and Thai Language Resources |
|
Experimental |
| 30 |
Michael95-m/myanmar_names
Burmese name conversion with rule-based method (Burmese to English and... |
|
Experimental |
| 31 |
papamusa/Three-word-sentences
🔤 Master three-word sentences for clear English communication through simple... |
|
Experimental |
| 32 |
thomas-chauvet/names_transliteration
Neural Machine Translation (NMT) applied to transliterate names in arabic... |
|
Experimental |
| 33 |
dmitry-rvn/ru-svo-triplets
Subject-verb-object triplets extraction for russian language. |
|
Experimental |
| 34 |
ye-kyaw-thu/MSL4Emergency
Myanmar Sign Language Corpus for Emergency Domain |
|
Experimental |
| 35 |
Socret360/joint-khmer-word-segmentation-and-pos-tagging
A Keras implementation of a deep learning network to simultaneously perform... |
|
Experimental |
| 36 |
shayneobrien/text-segmentation
Neural and nonneural text segmentation methods. |
|
Experimental |
| 37 |
suralmasha/RuTranscript
Russian phonetical transcription |
|
Experimental |
| 38 |
SaPhyoThuHtet/myanmar-part-of-speech-tagging-based-on-machine-translation
POS Tagging Based on Machine Translation (UTYCC Class Final Project) |
|
Experimental |
| 39 |
ThuraAung1601/myTypo
myTypo : Typographic Error Simulator for Myanmar Language |
|
Experimental |
| 40 |
eemberda/Cebuano-Syllable-Decoder
Accepts a Cebuano word and breaks it down into syllables |
|
Experimental |