Synthetic Data Generation Generative AI Tools

Tools for generating synthetic tabular, time-series, and structured data with focus on fidelity, privacy, and utility evaluation. Includes SDV frameworks, GANs, diffusion models, and benchmarking suites. Does NOT include general data augmentation for NLP/NER tasks or domain-specific synthetic generation (clinical data, images, audio).

There are 79 synthetic data generation tools tracked. 2 score above 70 (verified tier). The highest-rated is sdv-dev/SDV at 94/100 with 3,439 stars and 150,480 monthly downloads. 2 of the top 10 are actively maintained.

Get all 79 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=generative-ai&subcategory=synthetic-data-generation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 sdv-dev/SDV

Synthetic data generation for tabular data

94
Verified
2 sdv-dev/SDGym

Benchmarking synthetic data generation methods.

72
Verified
3 NVIDIA-NeMo/DataDesigner

🎨 NeMo Data Designer: A general library for generating high-quality...

61
Established
4 AlexanderVNikitin/tsgm

Generation and evaluation of synthetic time series datasets (also,...

57
Established
5 wwhenxuan/S2Generator

A series-symbol (S2) dual-modality data generation mechanism, enabling the...

56
Established
6 hitsz-ids/synthetic-data-generator

SDG is a specialized framework designed to generate high-quality structured...

55
Established
7 mostly-ai/mostlyai

Synthetic Data SDK ✨

54
Established
8 microsoft/genalog

Genalog is an open source, cross-platform python package allowing generation...

49
Emerging
9 microsoft/TimeCraft

Official code for TimeCraft: A Time Series Generation Framework for...

45
Emerging
10 sebhaan/TabPFGen

TabPFGen: Synthetic Tabular Data Generation with TabPFN

44
Emerging
11 aiim-research/GRETEL

GRETEL is a framework for the development and evaluation of Counterfactual...

44
Emerging
12 nhatkhangcs/synthetic_generator

Synthetic Data Generator for Machine Learning Pipelines

43
Emerging
13 gretelai/gretel-synthetics

Synthetic data generators for structured and unstructured text, featuring...

40
Emerging
14 kayua/MalDataGen

MalDataGen is an advanced Python framework for generating and evaluating...

37
Emerging
15 ELM-Research/ECG-Neural-Networks

Research-oriented pretraining and evaluation pipelines for ECG-specific...

37
Emerging
16 highfem/tqdne

Generative modeling of seismic waveforms

36
Emerging
17 Clearbox-AI/clearbox-synthetic-kit

Clearbox AI's all-in-one solution for generation and evaluation of synthetic...

36
Emerging
18 pedrodevog/SynthECG

The first systematic evaluation framework for synthetic 10-second 12-lead...

35
Emerging
19 mims-harvard/CLEF

Controllable Sequence Editing for Counterfactual Generation

34
Emerging
20 SilenceX12138/TabEval

📐 A comprehensive Python framework for evaluating tabular data.

34
Emerging
21 telmomenezes/synthetic

Symbolic Generators for Complex Networks

34
Emerging
22 shadowboxingskills/ppchain

Your Probabilistic Modeling Copilot

34
Emerging
23 jameszhou-gl/HiSGT

Code for ECAI'25-Generating Clinically Realistic EHR Data via a Hierarchy-...

33
Emerging
24 KodCode-AI/kodcode

✨ A synthetic dataset generation framework that produces diverse coding...

33
Emerging
25 Gurobi/gurobi-ai-modeling

Generative AI for Mathematical Modeling

32
Emerging
26 Shekswess/synthgenai

SynthGenAI - Package for Generating Synthetic Datasets using LLMs.

32
Emerging
27 SilenceX12138/TabStruct

🗼 [ICLR 2026 Oral] Official implementation of “TabStruct: Measuring...

31
Emerging
28 iperov/SSHG

Simple Synthetic Head Generator

30
Emerging
29 ComplexData-MILA/AIF-Gen

Generating Synthetic Lifelong RL Data for LLMs at Scale

30
Emerging
30 Lysarthas/Time-Transformer

[SDM24] Official code for "Time-Transformer"

29
Experimental
31 caetas/GenerativeZoo

Model Zoo for Generative Models.

29
Experimental
32 grantzyr/MM-Health-Dataset

[EMNLP 2025 Findings] Official repo for paper: From Generation to Detection:...

28
Experimental
33 zjowowen/FuncGenFoil

Airfoil Generation and Editing Model in Function Space

28
Experimental
34 zealscott/SynMeter

A principled library for tuning, training and evaluating tabular data...

28
Experimental
35 ViacheslavDanilov/generative_design

This repository is dedicated to the development of an approach based on...

27
Experimental
36 Sreyan88/DALE

Code for EMNLP 2023 paper: DALE: Generative Data Augmentation for...

27
Experimental
37 markweberdev/maskbit

Implementation of the paper "MaskBit: Embedding-free Image Generation from...

27
Experimental
38 filipaldi/ai-font-generation-projects

AI Font Generation Benchmarks. Comparative analysis of AI font generation...

27
Experimental
39 KonstantinosBarmpas/NeuroRVQ

NeuroRVQ: Multi-Scale EEG Tokenization for Generative Large Brainwave Models

26
Experimental
40 Trustworthy-ML-Lab/posthoc-generative-cbm

[CVPR 2025] Concept Bottleneck Autoencoder (CB-AE) -- efficiently transform...

26
Experimental
41 OpenProteinAI/openprotein-python

Simple python interface for the OpenProtein.AI REST API.

25
Experimental
42 DorinDaniil/Garage

Cutting-edge Python library designed for generative image augmentation!

25
Experimental
43 ELM-Research/ecg_nn

Research-oriented pretraining and evaluation pipelines for ECG-specific...

25
Experimental
44 michelecafagna26/vl-shap

[Frontiers in AI Journal] Implementation of the paper "Interpreting Vision...

23
Experimental
45 Lee-CBG/TCRGen

Self-Contemplating In-Context Learning Enhances T Cell Receptor Generation...

22
Experimental
46 ML4ITS/synthetic-data

Generate synthetic time-series using generative adversarial networks....

22
Experimental
47 Diegomangasco/GenSUMO

Generative AI to create synthetic SUMO scenarios

22
Experimental
48 jameszhou-gl/Coogee

Coogee: An integrated pipeline for generating and auditing clinically...

22
Experimental
49 AmirhosseinHonardoust/Synthetic-Data-Artist

A professional, research-grade comparison of Gaussian Copula and Variational...

21
Experimental
50 MorningStarTM/Synthetic-Data-Generator

This Project for Creating unified tool to generate synthetic data (text and...

21
Experimental
51 kayua/SyntheticOceanAI

SyntheticOcean: Open-Source Library for Generating Synthetic Tabular Data +...

19
Experimental
52 AIML-MED/Mirror-CFE

[ICCV25] Looking in the Mirror: A Faithful Counterfactual Explanation Method...

19
Experimental
53 KonstantinosBarmpas/LaBraM-plus-plus

[NeurIPS 2025] Neural Information Processing Systems(2025) - Foundation...

18
Experimental
54 Mycheaux/DB-conv

Self-supervised generative AI enables conversion of two non-overlapping...

18
Experimental
55 Sreyan88/CoDa

Code for NAACL 2024 (Findings) Paper: CoDa: Constrained Generation based...

18
Experimental
56 HowieHwong/DataGen

[ICLR'25] DataGen: Unified Synthetic Dataset Generation via Large Language Models

17
Experimental
57 FishAres/RNP6

Code for Recursive Neural Programs: A differentiable framework for learning...

16
Experimental
58 vertaix/Alternators

This repository contains the implementation of **Alternators**, a novel...

16
Experimental
59 Sreyan88/ACLM

Code for ACL 2023 Paper: ACLM: A Selective-Denoising based Generative Data...

15
Experimental
60 Sreyan88/ABEX

Code for ACL 2024 paper -- ABEX: Data Augmentation for Low-Resource NLU via...

15
Experimental
61 dario-coscia/barnn

BARNN: A Bayesian Autoregressive and Recurrent Neural Network - Official Repository

14
Experimental
62 alexkoulakos/explain-then-predict

Source code for the BlackBoxNLP 2024 @ EMNLP paper "Enhancing adversarial...

14
Experimental
63 rubsj/ai-synthetic-data-generator

Synthetic dataset generation pipeline with Pydantic validation and...

14
Experimental
64 Chun-Bae/eeg-emotion-gen-compare

Comparing generative models for EEG emotion classification.

14
Experimental
65 kj14173/neuro-sequential-generative-core

A research-oriented implementation of sequential generative models for...

14
Experimental
66 marquito3012/TFM

Framework de IA Generativa para la creación de datos tabulares sintéticos en...

14
Experimental
67 Sreyan88/Synthio

Code for ICLR 2025 Paper: Synthio: Augmenting Small-Scale Audio...

14
Experimental
68 j9smith/generative-modelling

Notebook series exploring the theory and implementation of various generative models.

12
Experimental
69 DanteTrb/fall-risk-predictor

A fullstack AI-powered web application to assess fall risk in patients with...

12
Experimental
70 rizac/gmgt

Ground Motion Ground Truth is a collection of datasets of ground motion time...

12
Experimental
71 yrodriguezmd/Synthetic_Medical_Tabular_Data

Generate synthetic medical data from a patient population dataset.

12
Experimental
72 cMancio00/ebm-molecules

This is my thesis for Computer Science master degree at University of Florence

12
Experimental
73 ImJaeSung/Synthesizers

Implementations of various synthesizers with pytorch.

12
Experimental
74 NITHISHM2410/spatial-temporal-transformer

Spatial Temporal Transformer to capture Spatial and Temporal dynamics.

12
Experimental
75 silvano315/Gen-AI-for-Data-Augmentation

This is the ninth project of AI Engineering Master. It aims to use...

11
Experimental
76 Okja88/Visual-GenAI-Applications

A comprehensive portfolio of Visual Generative AI projects featuring...

11
Experimental
77 wilhelmagren/syndgen

SYNthetic Data GENeration made easy for everyone, free and open-sourced.

11
Experimental
78 shadowboxingskills/ppchainR

Your Probabilistic Modeling Copilot

10
Experimental
79 tacclab/bio_dataset_manager

This tool facilitates the encoding of these sequences into tensors, which...

10
Experimental