Synthetic Data Generation ML Frameworks

Tools and frameworks for generating synthetic datasets across tabular, time-series, and domain-specific data modalities, including benchmarking and evaluation methods. Does NOT include real dataset collections, data augmentation techniques, or domain-specific applications that use synthetic data.

There are 45 synthetic data generation frameworks tracked. 7 score above 50 (established tier). The highest-rated is tdspora/syngen at 62/100 with 18 stars and 2,652 monthly downloads. 2 of the top 10 are actively maintained.

Get all 45 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=synthetic-data-generation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Framework Score Tier
1 tdspora/syngen

Open-source version of the TDspora synthetic data generation algorithm.

62
Established
2 Diyago/Tabular-data-generation

We well know GANs for success in the realistic image generation. However,...

62
Established
3 meta-llama/synthetic-data-kit

Tool for generating high quality Synthetic datasets

56
Established
4 always-further/deepfabric

Generate High-Quality Synthetics, Train, Measure, and Evaluate in a Single Pipeline

53
Established
5 Data-Centric-AI-Community/ydata-synthetic

Synthetic data generators for tabular and time-series data

52
Established
6 wiseodd/generative-models

Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow.

51
Established
7 vanderschaarlab/synthcity

A library for generating and evaluating synthetic tabular data for privacy,...

50
Established
8 AlejandroBeldaFernandez/Calm-Data_Generator

CALM-Data-Generator is a comprehensive Python library for synthetic data...

41
Emerging
9 bensonlee5/dagzoo

Synthetic tabular data generator for causal modeling

40
Emerging
10 aliseyfi75/COSCI-GAN

Codebase for "Generating multivariate time series with COmmon Source...

39
Emerging
11 tirthajyoti/Synthetic-data-gen

Various methods for generating synthetic data for data science and ML

39
Emerging
12 shayneobrien/generative-models

Annotated, understandable, and visually interpretable PyTorch...

39
Emerging
13 martinjurkovic/syntherela

A package for benchmarking synthetic relational data generation methods

39
Emerging
14 alfurka/synloc

A Python Package to Create Synthetic Tabular Data

38
Emerging
15 SAGDAfrica/sagda

Synthetic Agriculture Data for Africa

38
Emerging
16 Team-TUD/CTAB-GAN

Official git for "CTAB-GAN: Effective Table Data Synthesizing"

37
Emerging
17 federicoarenasl/sdg-engine

A simple data generation engine for computer vision, compatible with 🤗 datasets.

33
Emerging
18 gretelai/trainer

Simple interface to synthesize complex and highly dimensional datasets using...

32
Emerging
19 stefan-jansen/synthetic-data-for-finance

Material for QuantUniversity talk on Sythetic Data Generation for Finance.

32
Emerging
20 MRSAIL-Mini-Robotics-Software-AI-Lab/GANVAS-models

Generative Autoregressive, Normalized Flows, VAEs, Score-based models (GANVAS)

28
Experimental
21 TrevorW-code/fraud

synthetic data for ml

27
Experimental
22 AmirhosseinHonardoust/Autocurator-Synthetic-Data-Benchmark

Autocurator is a comprehensive benchmarking toolkit for evaluating synthetic...

26
Experimental
23 abideenml/AutoSynth

Automatically create synthetic data using SOTA techniques (Self Instruct,...

25
Experimental
24 antorguez95/synthetic_data_generation_framework

This repository contains the code of our published work in IEEE JBHI. Our...

25
Experimental
25 CFA-Institute-RPC/Synthetic-Data-For-Finance

This repository contains accompanying code for the CFA Institute's Research...

24
Experimental
26 GarouMonste/Teaching-Neural-Networks-to-Imagine-Tables

🛠️ Develop a Variational Autoencoder to generate realistic tabular data,...

23
Experimental
27 Data-Centric-AI-Community/nist-crc-2023

NIST Collaborative Research Cycle on Synthetic Data. Learn about Synthetic...

23
Experimental
28 DerwenAI/kleptosyn

Synthetic data generation for investigative graphs based on patterns of...

23
Experimental
29 Rufina46/time-series-synthetic

Open-source synthetic time-series generator for ML testing

23
Experimental
30 dataxid/dataxid-python

The Synthetic Data API. Generate privacy-safe synthetic data with 5 lines of code.

22
Experimental
31 Melckykaisha/synthetic-data-generation-demo

Interactive demonstration of synthetic data generation using GANs and VAEs...

22
Experimental
32 oRyyu2703/Autocurator-Synthetic-Data-Benchmark

🔍 Evaluate synthetic data quality against real tabular datasets with...

22
Experimental
33 hipaasynth-svg/HipAAsynth

Deterministic synthetic clinical data engine. Zero dependencies. Fully reproducible.

22
Experimental
34 EPFL-ENAC/TOPO-DataGen

[CVPR'22] TOPO-DataGen: an open and scalable aerial synthetic data...

19
Experimental
35 jaimeperezsanchez/GAN_Scenario_Forecasting

Data augmentation through multivariate scenario forecasting in Data Centers...

19
Experimental
36 EmrahFidan/MissingLink

Synthetic tabular data generation engine — CTGAN deep learning for CSV...

14
Experimental
37 julsngbatac/GANs-For-Synthetic-Data-Generation

🤖 Generate realistic synthetic data using GANs to boost AI model training...

14
Experimental
38 aia39/Synthetic-Tabular-Data-Generation-using-CTGAN-and-classify-with-XGboost

This is the repository to generate synthetic tabular data when the tabular...

14
Experimental
39 Hunny-Mane/Polygen

PolyGen is a technical demonstration of high-concurrency data visualization...

14
Experimental
40 Fixer1983/synthetic-data-gen

Scalable synthetic data generation for training robust ML models.

14
Experimental
41 anurag-3-nair/Synthetic-Driving-Data-Generation-Pipeline

A personal project to investigate a machine-learning synthetic data generation.

14
Experimental
42 abdulvahapmutlu/als-synthetic-data-augmentation-wgan

This project aims to address the lack of EEG signals for ALS (Amyotrophic...

12
Experimental
43 volodya7292/synthetic_data

Synthetic data generation system library.

11
Experimental
44 MongoExpUser/Synthetic-Drilling-Data-App-for-Sqlite-ML

Generate synthetic drilling data that can be used for testing machine...

11
Experimental
45 wildanjr19/generative-model

Learn and build generative model from scratch, mostly in PyTorch

11
Experimental