Synthetic Data Generation ML Frameworks
Tools and frameworks for generating synthetic datasets across tabular, time-series, and domain-specific data modalities, including benchmarking and evaluation methods. Does NOT include real dataset collections, data augmentation techniques, or domain-specific applications that use synthetic data.
There are 45 synthetic data generation frameworks tracked. 7 score above 50 (established tier). The highest-rated is tdspora/syngen at 62/100 with 18 stars and 2,652 monthly downloads. 2 of the top 10 are actively maintained.
Get all 45 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=synthetic-data-generation&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Framework | Score | Tier |
|---|---|---|---|
| 1 |
tdspora/syngen
Open-source version of the TDspora synthetic data generation algorithm. |
|
Established |
| 2 |
Diyago/Tabular-data-generation
We well know GANs for success in the realistic image generation. However,... |
|
Established |
| 3 |
meta-llama/synthetic-data-kit
Tool for generating high quality Synthetic datasets |
|
Established |
| 4 |
always-further/deepfabric
Generate High-Quality Synthetics, Train, Measure, and Evaluate in a Single Pipeline |
|
Established |
| 5 |
Data-Centric-AI-Community/ydata-synthetic
Synthetic data generators for tabular and time-series data |
|
Established |
| 6 |
wiseodd/generative-models
Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow. |
|
Established |
| 7 |
vanderschaarlab/synthcity
A library for generating and evaluating synthetic tabular data for privacy,... |
|
Established |
| 8 |
AlejandroBeldaFernandez/Calm-Data_Generator
CALM-Data-Generator is a comprehensive Python library for synthetic data... |
|
Emerging |
| 9 |
bensonlee5/dagzoo
Synthetic tabular data generator for causal modeling |
|
Emerging |
| 10 |
aliseyfi75/COSCI-GAN
Codebase for "Generating multivariate time series with COmmon Source... |
|
Emerging |
| 11 |
tirthajyoti/Synthetic-data-gen
Various methods for generating synthetic data for data science and ML |
|
Emerging |
| 12 |
shayneobrien/generative-models
Annotated, understandable, and visually interpretable PyTorch... |
|
Emerging |
| 13 |
martinjurkovic/syntherela
A package for benchmarking synthetic relational data generation methods |
|
Emerging |
| 14 |
alfurka/synloc
A Python Package to Create Synthetic Tabular Data |
|
Emerging |
| 15 |
SAGDAfrica/sagda
Synthetic Agriculture Data for Africa |
|
Emerging |
| 16 |
Team-TUD/CTAB-GAN
Official git for "CTAB-GAN: Effective Table Data Synthesizing" |
|
Emerging |
| 17 |
federicoarenasl/sdg-engine
A simple data generation engine for computer vision, compatible with 🤗 datasets. |
|
Emerging |
| 18 |
gretelai/trainer
Simple interface to synthesize complex and highly dimensional datasets using... |
|
Emerging |
| 19 |
stefan-jansen/synthetic-data-for-finance
Material for QuantUniversity talk on Sythetic Data Generation for Finance. |
|
Emerging |
| 20 |
MRSAIL-Mini-Robotics-Software-AI-Lab/GANVAS-models
Generative Autoregressive, Normalized Flows, VAEs, Score-based models (GANVAS) |
|
Experimental |
| 21 |
TrevorW-code/fraud
synthetic data for ml |
|
Experimental |
| 22 |
AmirhosseinHonardoust/Autocurator-Synthetic-Data-Benchmark
Autocurator is a comprehensive benchmarking toolkit for evaluating synthetic... |
|
Experimental |
| 23 |
abideenml/AutoSynth
Automatically create synthetic data using SOTA techniques (Self Instruct,... |
|
Experimental |
| 24 |
antorguez95/synthetic_data_generation_framework
This repository contains the code of our published work in IEEE JBHI. Our... |
|
Experimental |
| 25 |
CFA-Institute-RPC/Synthetic-Data-For-Finance
This repository contains accompanying code for the CFA Institute's Research... |
|
Experimental |
| 26 |
GarouMonste/Teaching-Neural-Networks-to-Imagine-Tables
🛠️ Develop a Variational Autoencoder to generate realistic tabular data,... |
|
Experimental |
| 27 |
Data-Centric-AI-Community/nist-crc-2023
NIST Collaborative Research Cycle on Synthetic Data. Learn about Synthetic... |
|
Experimental |
| 28 |
DerwenAI/kleptosyn
Synthetic data generation for investigative graphs based on patterns of... |
|
Experimental |
| 29 |
Rufina46/time-series-synthetic
Open-source synthetic time-series generator for ML testing |
|
Experimental |
| 30 |
dataxid/dataxid-python
The Synthetic Data API. Generate privacy-safe synthetic data with 5 lines of code. |
|
Experimental |
| 31 |
Melckykaisha/synthetic-data-generation-demo
Interactive demonstration of synthetic data generation using GANs and VAEs... |
|
Experimental |
| 32 |
oRyyu2703/Autocurator-Synthetic-Data-Benchmark
🔍 Evaluate synthetic data quality against real tabular datasets with... |
|
Experimental |
| 33 |
hipaasynth-svg/HipAAsynth
Deterministic synthetic clinical data engine. Zero dependencies. Fully reproducible. |
|
Experimental |
| 34 |
EPFL-ENAC/TOPO-DataGen
[CVPR'22] TOPO-DataGen: an open and scalable aerial synthetic data... |
|
Experimental |
| 35 |
jaimeperezsanchez/GAN_Scenario_Forecasting
Data augmentation through multivariate scenario forecasting in Data Centers... |
|
Experimental |
| 36 |
EmrahFidan/MissingLink
Synthetic tabular data generation engine — CTGAN deep learning for CSV... |
|
Experimental |
| 37 |
julsngbatac/GANs-For-Synthetic-Data-Generation
🤖 Generate realistic synthetic data using GANs to boost AI model training... |
|
Experimental |
| 38 |
aia39/Synthetic-Tabular-Data-Generation-using-CTGAN-and-classify-with-XGboost
This is the repository to generate synthetic tabular data when the tabular... |
|
Experimental |
| 39 |
Hunny-Mane/Polygen
PolyGen is a technical demonstration of high-concurrency data visualization... |
|
Experimental |
| 40 |
Fixer1983/synthetic-data-gen
Scalable synthetic data generation for training robust ML models. |
|
Experimental |
| 41 |
anurag-3-nair/Synthetic-Driving-Data-Generation-Pipeline
A personal project to investigate a machine-learning synthetic data generation. |
|
Experimental |
| 42 |
abdulvahapmutlu/als-synthetic-data-augmentation-wgan
This project aims to address the lack of EEG signals for ALS (Amyotrophic... |
|
Experimental |
| 43 |
volodya7292/synthetic_data
Synthetic data generation system library. |
|
Experimental |
| 44 |
MongoExpUser/Synthetic-Drilling-Data-App-for-Sqlite-ML
Generate synthetic drilling data that can be used for testing machine... |
|
Experimental |
| 45 |
wildanjr19/generative-model
Learn and build generative model from scratch, mostly in PyTorch |
|
Experimental |