Arabic Text Normalization NLP Tools
Tools for Arabic-specific text processing including diacritization (vowelization), dialect identification/classification, and transliteration between Arabic scripts and romanization systems. Does NOT include general morphological analysis, stemming, or non-Arabic language processing.
There are 15 arabic text normalization tools tracked. 1 score above 50 (established tier). The highest-rated is linuxscout/mishkal at 51/100 with 307 stars.
Get all 15 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=arabic-text-normalization&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
linuxscout/mishkal
Mishkal is an arabic text vocalization software |
|
Established |
| 2 |
AliOsm/arabic-text-diacritization
Benchmark Arabic text diacritization dataset |
|
Emerging |
| 3 |
mush42/libtashkeel
Add Arabic diacritics (tashkeel/harakat) using Rust/Python/C++/WASM and NLP models |
|
Emerging |
| 4 |
AliOsm/shakkelha
Neural Arabic text diacritization |
|
Emerging |
| 5 |
hb20007/greek-dialect-classifier
Classifier that identifies Greek text as Cypriot Greek or Standard Modern Greek |
|
Emerging |
| 6 |
BasmaElhoseny01/Tashkeel
A system that takes a sentence and produces the same sentence after... |
|
Experimental |
| 7 |
AbdelrahmanHamdyy/Arabic-Text-Diacritization
Course Project for Natural Language Processing |
|
Experimental |
| 8 |
saobou/DSAraby
We've created a library named "DSAraby" that aims to transliterate text... |
|
Experimental |
| 9 |
WoLFi22/DialectClassificationPipeline
This repository provides a pipeline for dialect classification using deep... |
|
Experimental |
| 10 |
hazemhosny/ArabicDialectClassification
Arabic Dialect Sentimenal Analysis |
|
Experimental |
| 11 |
norhanreda/Arabic-Text-Diacritization
Diacritics are short vowels with a constant length that are spoken. The same... |
|
Experimental |
| 12 |
textgain/redcrow
Arabic Dialect Identifier |
|
Experimental |
| 13 |
Crinmatic/Diacritic-Restoration
Using AI to restore Diacritics on Yoruba language (which is a low resource language) |
|
Experimental |
| 14 |
adelelwan24/Arabic-Dialect-Classification
Many countries speak Arabic; however, each country has its own dialect, the... |
|
Experimental |
| 15 |
nipponjo/arabic_vocalizer
Arabic deep-learning based diacritization models (Shakkala, Shakkelha) in... |
|
Experimental |