Persian NLP Datasets NLP Tools
Curated datasets, lexicons, and linguistic resources specifically for Persian/Farsi language NLP tasks including QA, sentiment analysis, text classification, and OCR. Does NOT include general multilingual resources, pre-trained models, or tools for other languages.
There are 22 persian nlp datasets tools tracked. 1 score above 50 (established tier). The highest-rated is amirshnll/Persian-Swear-Words at 53/100 with 308 stars.
Get all 22 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=persian-nlp-datasets&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
amirshnll/Persian-Swear-Words
Persian Swear Dataset - you can use in your production to filter unwanted... |
|
Established |
| 2 |
sajjjadayobi/PersianQA
Persian (Farsi) Question Answering Dataset (+ Models) |
|
Emerging |
| 3 |
miras-tech/MirasText
MirasText |
|
Emerging |
| 4 |
farbodbj/persian-gender-by-name
A comprehensive dataset for determining gender based on Persian names,... |
|
Emerging |
| 5 |
dml-qom/FarsTail
FarsTail: a Persian natural language inference dataset |
|
Emerging |
| 6 |
BodduSriPavan-111/chandassu
Chandassu: First Python Library for Global Metrical Poetry |
|
Emerging |
| 7 |
Text-Mining/Persian-Sentiment-Resources
Awesome Persian Sentiment Analysis Resources - منابع مرتبط با تحلیل احساسات... |
|
Emerging |
| 8 |
aghasemi/ChronologicalPersianPoetryDataset
A chronological (up to the century in which the poet has lived) of Persian... |
|
Emerging |
| 9 |
ratitya/JumpLander-Persian-Forum-Dataset
📊 Access a structured Persian forum dataset to enhance NLP models for text... |
|
Experimental |
| 10 |
phosseini/SentiPers
SentiPers: A Sentiment Analysis Corpus for Persian https://arxiv.org/abs/1801.07737 |
|
Experimental |
| 11 |
farbodbj/iranian-surname-frequencies
Welcome to the Persian Last Names Dataset, a comprehensive collection of... |
|
Experimental |
| 12 |
MohammadrezaAmani/JameJamCorpus
Official repository of Jam-e Jam News Dataset and NLP Model. |
|
Experimental |
| 13 |
amirabbasasadi/Shotor
Free Persian Word Level OCR Dataset |
|
Experimental |
| 14 |
taesiri/PersianWordVectors
A set of pre-trained word vectors for Persian language |
|
Experimental |
| 15 |
jumplander-readme/JumpLander-Persian-Forum-Dataset
This dataset contains a clean and structured subset of Persian community... |
|
Experimental |
| 16 |
phosseini/LexiPers
A Sentiment Analysis Lexicon for Persian https://arxiv.org/abs/1911.05263 |
|
Experimental |
| 17 |
hctilg/finglish
A Finglish to Persian converter. |
|
Experimental |
| 18 |
dcaled/mint
This project provides the metadata and the crawlers to download the MIND... |
|
Experimental |
| 19 |
Mohampouraz/Persian-poetry
A comprehensive repository of classical Persian poetry, curated from... |
|
Experimental |
| 20 |
kargaranamir/Persian-Datasets
Persian Datasets including: Wikipedia, Twitter, Hamshahri, Hellokish,... |
|
Experimental |
| 21 |
semnan-university-ai/persian-slang
Persian Slang Words (dataset) |
|
Experimental |
| 22 |
IR1401-Spring-Final-Projects/Saadi1401-5_23
1401/Spring/InformationRetrieval/g5+23 |
|
Experimental |