Arabic Text Normalization NLP Tools

Tools for Arabic-specific text processing including diacritization (vowelization), dialect identification/classification, and transliteration between Arabic scripts and romanization systems. Does NOT include general morphological analysis, stemming, or non-Arabic language processing.

There are 15 arabic text normalization tools tracked. 1 score above 50 (established tier). The highest-rated is linuxscout/mishkal at 51/100 with 307 stars.

Get all 15 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=arabic-text-normalization&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 linuxscout/mishkal

Mishkal is an arabic text vocalization software

51
Established
2 AliOsm/arabic-text-diacritization

Benchmark Arabic text diacritization dataset

44
Emerging
3 mush42/libtashkeel

Add Arabic diacritics (tashkeel/harakat) using Rust/Python/C++/WASM and NLP models

43
Emerging
4 AliOsm/shakkelha

Neural Arabic text diacritization

41
Emerging
5 hb20007/greek-dialect-classifier

Classifier that identifies Greek text as Cypriot Greek or Standard Modern Greek

38
Emerging
6 BasmaElhoseny01/Tashkeel

A system that takes a sentence and produces the same sentence after...

24
Experimental
7 AbdelrahmanHamdyy/Arabic-Text-Diacritization

Course Project for Natural Language Processing

23
Experimental
8 saobou/DSAraby

We've created a library named "DSAraby" that aims to transliterate text...

23
Experimental
9 WoLFi22/DialectClassificationPipeline

This repository provides a pipeline for dialect classification using deep...

21
Experimental
10 hazemhosny/ArabicDialectClassification

Arabic Dialect Sentimenal Analysis

16
Experimental
11 norhanreda/Arabic-Text-Diacritization

Diacritics are short vowels with a constant length that are spoken. The same...

16
Experimental
12 textgain/redcrow

Arabic Dialect Identifier

16
Experimental
13 Crinmatic/Diacritic-Restoration

Using AI to restore Diacritics on Yoruba language (which is a low resource language)

16
Experimental
14 adelelwan24/Arabic-Dialect-Classification

Many countries speak Arabic; however, each country has its own dialect, the...

15
Experimental
15 nipponjo/arabic_vocalizer

Arabic deep-learning based diacritization models (Shakkala, Shakkelha) in...

14
Experimental