Synthetic Data Generation Generative AI Tools
Tools for generating synthetic tabular, time-series, and structured data with focus on fidelity, privacy, and utility evaluation. Includes SDV frameworks, GANs, diffusion models, and benchmarking suites. Does NOT include general data augmentation for NLP/NER tasks or domain-specific synthetic generation (clinical data, images, audio).
There are 79 synthetic data generation tools tracked. 2 score above 70 (verified tier). The highest-rated is sdv-dev/SDV at 94/100 with 3,439 stars and 150,480 monthly downloads. 2 of the top 10 are actively maintained.
Get all 79 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=generative-ai&subcategory=synthetic-data-generation&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
sdv-dev/SDV
Synthetic data generation for tabular data |
|
Verified |
| 2 |
sdv-dev/SDGym
Benchmarking synthetic data generation methods. |
|
Verified |
| 3 |
NVIDIA-NeMo/DataDesigner
🎨 NeMo Data Designer: A general library for generating high-quality... |
|
Established |
| 4 |
AlexanderVNikitin/tsgm
Generation and evaluation of synthetic time series datasets (also,... |
|
Established |
| 5 |
wwhenxuan/S2Generator
A series-symbol (S2) dual-modality data generation mechanism, enabling the... |
|
Established |
| 6 |
hitsz-ids/synthetic-data-generator
SDG is a specialized framework designed to generate high-quality structured... |
|
Established |
| 7 |
mostly-ai/mostlyai
Synthetic Data SDK ✨ |
|
Established |
| 8 |
microsoft/genalog
Genalog is an open source, cross-platform python package allowing generation... |
|
Emerging |
| 9 |
microsoft/TimeCraft
Official code for TimeCraft: A Time Series Generation Framework for... |
|
Emerging |
| 10 |
sebhaan/TabPFGen
TabPFGen: Synthetic Tabular Data Generation with TabPFN |
|
Emerging |
| 11 |
aiim-research/GRETEL
GRETEL is a framework for the development and evaluation of Counterfactual... |
|
Emerging |
| 12 |
nhatkhangcs/synthetic_generator
Synthetic Data Generator for Machine Learning Pipelines |
|
Emerging |
| 13 |
gretelai/gretel-synthetics
Synthetic data generators for structured and unstructured text, featuring... |
|
Emerging |
| 14 |
kayua/MalDataGen
MalDataGen is an advanced Python framework for generating and evaluating... |
|
Emerging |
| 15 |
ELM-Research/ECG-Neural-Networks
Research-oriented pretraining and evaluation pipelines for ECG-specific... |
|
Emerging |
| 16 |
highfem/tqdne
Generative modeling of seismic waveforms |
|
Emerging |
| 17 |
Clearbox-AI/clearbox-synthetic-kit
Clearbox AI's all-in-one solution for generation and evaluation of synthetic... |
|
Emerging |
| 18 |
pedrodevog/SynthECG
The first systematic evaluation framework for synthetic 10-second 12-lead... |
|
Emerging |
| 19 |
mims-harvard/CLEF
Controllable Sequence Editing for Counterfactual Generation |
|
Emerging |
| 20 |
SilenceX12138/TabEval
📐 A comprehensive Python framework for evaluating tabular data. |
|
Emerging |
| 21 |
telmomenezes/synthetic
Symbolic Generators for Complex Networks |
|
Emerging |
| 22 |
shadowboxingskills/ppchain
Your Probabilistic Modeling Copilot |
|
Emerging |
| 23 |
jameszhou-gl/HiSGT
Code for ECAI'25-Generating Clinically Realistic EHR Data via a Hierarchy-... |
|
Emerging |
| 24 |
KodCode-AI/kodcode
✨ A synthetic dataset generation framework that produces diverse coding... |
|
Emerging |
| 25 |
Gurobi/gurobi-ai-modeling
Generative AI for Mathematical Modeling |
|
Emerging |
| 26 |
Shekswess/synthgenai
SynthGenAI - Package for Generating Synthetic Datasets using LLMs. |
|
Emerging |
| 27 |
SilenceX12138/TabStruct
🗼 [ICLR 2026 Oral] Official implementation of “TabStruct: Measuring... |
|
Emerging |
| 28 |
iperov/SSHG
Simple Synthetic Head Generator |
|
Emerging |
| 29 |
ComplexData-MILA/AIF-Gen
Generating Synthetic Lifelong RL Data for LLMs at Scale |
|
Emerging |
| 30 |
Lysarthas/Time-Transformer
[SDM24] Official code for "Time-Transformer" |
|
Experimental |
| 31 |
caetas/GenerativeZoo
Model Zoo for Generative Models. |
|
Experimental |
| 32 |
grantzyr/MM-Health-Dataset
[EMNLP 2025 Findings] Official repo for paper: From Generation to Detection:... |
|
Experimental |
| 33 |
zjowowen/FuncGenFoil
Airfoil Generation and Editing Model in Function Space |
|
Experimental |
| 34 |
zealscott/SynMeter
A principled library for tuning, training and evaluating tabular data... |
|
Experimental |
| 35 |
ViacheslavDanilov/generative_design
This repository is dedicated to the development of an approach based on... |
|
Experimental |
| 36 |
Sreyan88/DALE
Code for EMNLP 2023 paper: DALE: Generative Data Augmentation for... |
|
Experimental |
| 37 |
markweberdev/maskbit
Implementation of the paper "MaskBit: Embedding-free Image Generation from... |
|
Experimental |
| 38 |
filipaldi/ai-font-generation-projects
AI Font Generation Benchmarks. Comparative analysis of AI font generation... |
|
Experimental |
| 39 |
KonstantinosBarmpas/NeuroRVQ
NeuroRVQ: Multi-Scale EEG Tokenization for Generative Large Brainwave Models |
|
Experimental |
| 40 |
Trustworthy-ML-Lab/posthoc-generative-cbm
[CVPR 2025] Concept Bottleneck Autoencoder (CB-AE) -- efficiently transform... |
|
Experimental |
| 41 |
OpenProteinAI/openprotein-python
Simple python interface for the OpenProtein.AI REST API. |
|
Experimental |
| 42 |
DorinDaniil/Garage
Cutting-edge Python library designed for generative image augmentation! |
|
Experimental |
| 43 |
ELM-Research/ecg_nn
Research-oriented pretraining and evaluation pipelines for ECG-specific... |
|
Experimental |
| 44 |
michelecafagna26/vl-shap
[Frontiers in AI Journal] Implementation of the paper "Interpreting Vision... |
|
Experimental |
| 45 |
Lee-CBG/TCRGen
Self-Contemplating In-Context Learning Enhances T Cell Receptor Generation... |
|
Experimental |
| 46 |
ML4ITS/synthetic-data
Generate synthetic time-series using generative adversarial networks.... |
|
Experimental |
| 47 |
Diegomangasco/GenSUMO
Generative AI to create synthetic SUMO scenarios |
|
Experimental |
| 48 |
jameszhou-gl/Coogee
Coogee: An integrated pipeline for generating and auditing clinically... |
|
Experimental |
| 49 |
AmirhosseinHonardoust/Synthetic-Data-Artist
A professional, research-grade comparison of Gaussian Copula and Variational... |
|
Experimental |
| 50 |
MorningStarTM/Synthetic-Data-Generator
This Project for Creating unified tool to generate synthetic data (text and... |
|
Experimental |
| 51 |
kayua/SyntheticOceanAI
SyntheticOcean: Open-Source Library for Generating Synthetic Tabular Data +... |
|
Experimental |
| 52 |
AIML-MED/Mirror-CFE
[ICCV25] Looking in the Mirror: A Faithful Counterfactual Explanation Method... |
|
Experimental |
| 53 |
KonstantinosBarmpas/LaBraM-plus-plus
[NeurIPS 2025] Neural Information Processing Systems(2025) - Foundation... |
|
Experimental |
| 54 |
Mycheaux/DB-conv
Self-supervised generative AI enables conversion of two non-overlapping... |
|
Experimental |
| 55 |
Sreyan88/CoDa
Code for NAACL 2024 (Findings) Paper: CoDa: Constrained Generation based... |
|
Experimental |
| 56 |
HowieHwong/DataGen
[ICLR'25] DataGen: Unified Synthetic Dataset Generation via Large Language Models |
|
Experimental |
| 57 |
FishAres/RNP6
Code for Recursive Neural Programs: A differentiable framework for learning... |
|
Experimental |
| 58 |
vertaix/Alternators
This repository contains the implementation of **Alternators**, a novel... |
|
Experimental |
| 59 |
Sreyan88/ACLM
Code for ACL 2023 Paper: ACLM: A Selective-Denoising based Generative Data... |
|
Experimental |
| 60 |
Sreyan88/ABEX
Code for ACL 2024 paper -- ABEX: Data Augmentation for Low-Resource NLU via... |
|
Experimental |
| 61 |
dario-coscia/barnn
BARNN: A Bayesian Autoregressive and Recurrent Neural Network - Official Repository |
|
Experimental |
| 62 |
alexkoulakos/explain-then-predict
Source code for the BlackBoxNLP 2024 @ EMNLP paper "Enhancing adversarial... |
|
Experimental |
| 63 |
rubsj/ai-synthetic-data-generator
Synthetic dataset generation pipeline with Pydantic validation and... |
|
Experimental |
| 64 |
Chun-Bae/eeg-emotion-gen-compare
Comparing generative models for EEG emotion classification. |
|
Experimental |
| 65 |
kj14173/neuro-sequential-generative-core
A research-oriented implementation of sequential generative models for... |
|
Experimental |
| 66 |
marquito3012/TFM
Framework de IA Generativa para la creación de datos tabulares sintéticos en... |
|
Experimental |
| 67 |
Sreyan88/Synthio
Code for ICLR 2025 Paper: Synthio: Augmenting Small-Scale Audio... |
|
Experimental |
| 68 |
j9smith/generative-modelling
Notebook series exploring the theory and implementation of various generative models. |
|
Experimental |
| 69 |
DanteTrb/fall-risk-predictor
A fullstack AI-powered web application to assess fall risk in patients with... |
|
Experimental |
| 70 |
rizac/gmgt
Ground Motion Ground Truth is a collection of datasets of ground motion time... |
|
Experimental |
| 71 |
yrodriguezmd/Synthetic_Medical_Tabular_Data
Generate synthetic medical data from a patient population dataset. |
|
Experimental |
| 72 |
cMancio00/ebm-molecules
This is my thesis for Computer Science master degree at University of Florence |
|
Experimental |
| 73 |
ImJaeSung/Synthesizers
Implementations of various synthesizers with pytorch. |
|
Experimental |
| 74 |
NITHISHM2410/spatial-temporal-transformer
Spatial Temporal Transformer to capture Spatial and Temporal dynamics. |
|
Experimental |
| 75 |
silvano315/Gen-AI-for-Data-Augmentation
This is the ninth project of AI Engineering Master. It aims to use... |
|
Experimental |
| 76 |
Okja88/Visual-GenAI-Applications
A comprehensive portfolio of Visual Generative AI projects featuring... |
|
Experimental |
| 77 |
wilhelmagren/syndgen
SYNthetic Data GENeration made easy for everyone, free and open-sourced. |
|
Experimental |
| 78 |
shadowboxingskills/ppchainR
Your Probabilistic Modeling Copilot |
|
Experimental |
| 79 |
tacclab/bio_dataset_manager
This tool facilitates the encoding of these sequences into tensors, which... |
|
Experimental |