ManaTTS-Persian-Speech-Dataset and GPTInformal-Persian-Speech-Dataset

These are complementary datasets designed for Persian text-to-speech development, where ManaTTS provides the larger foundation dataset (114+ hours) for training robust models while GPTInformal-Persian-Speech-Dataset offers a specialized, smaller dataset (6+ hours) with semantic labeling (subject metadata) for fine-tuning or domain-specific TTS applications.

ManaTTS-Persian-Speech-Dataset

Emerging

GPTInformal-Persian-Speech-Dataset

Experimental

Maintenance 2/25

Adoption 8/25

Maturity 16/25

Community 10/25

Maintenance 2/25

Adoption 5/25

Maturity 9/25

Community 7/25

Stars: 49

Forks: 5

Downloads: —

Commits (30d): 0

Language: Jupyter Notebook

License: MIT

Stars: 10

Forks: 1

Downloads: —

Commits (30d): 0

Language: —

License: MIT

Stale 6m No Package No Dependents

About ManaTTS-Persian-Speech-Dataset

MahtaFetrat/ManaTTS-Persian-Speech-Dataset

ManaTTS is the largest open Persian speech dataset with 114+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.

About GPTInformal-Persian-Speech-Dataset

MahtaFetrat/GPTInformal-Persian-Speech-Dataset

A free licensed Persian TTS dataset including 6+ hours of audio-text pairs with subject

Scores updated daily from GitHub, PyPI, and npm data. How scores work