PyThaiNLP/attacut
A Fast and Accurate Neural Thai Word Segmenter
Built on a 3-layer dilated CNN architecture that processes syllable and character features, AttaCut achieves 91% word-level F1 on the BEST benchmark while running 6x faster than previous state-of-the-art approaches. It integrates with PyTorch and provides both command-line and Python APIs for immediate use, with support for custom model retraining on user datasets. The toolkit includes pre-trained models (`attacut-sc` and `attacut-c`) optimized for different accuracy-speed tradeoffs in Thai NLP pipelines.
94 stars and 4,237 monthly downloads. Used by 1 other package. No commits in the last 6 months. Available on PyPI.
Stars
94
Forks
18
Language
Python
License
MIT
Category
Last pushed
Jan 14, 2025
Monthly downloads
4,237
Commits (30d)
0
Dependencies
8
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/PyThaiNLP/attacut"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
VietHoang1512/khmer-nltk
Khmer language processing toolkit
UlugbekSalaev/UzTransliterator
UzTransliterator | State-of-the-art machine transliteration tool for Uzbek language
seanghay/KhmerOCR
A Fast Khmer Optical Character Recognition (KhmerOCR)
AI4Bharat/IndicNLP-Transliteration
Codebase for Indic-Transliteration using Seq2Seq RNN. For latest repo with Transformer-based...
ionite34/Aquila-Resolve
Augmented Recurrent Neural Grapheme-to-Phoneme conversion with Inflectional Orthography.