SDV and synthetic-data-generator

The two tools appear to be **competitors**, as both are designed to generate high-quality structured tabular synthetic data, with SDV having significantly higher adoption based on stars and monthly downloads.

SDV
94
Verified
synthetic-data-generator
62
Established
Maintenance 23/25
Adoption 25/25
Maturity 25/25
Community 21/25
Maintenance 13/25
Adoption 10/25
Maturity 16/25
Community 23/25
Stars: 3,439
Forks: 417
Downloads: 150,480
Commits (30d): 36
Language: Python
License:
Stars: 2,409
Forks: 385
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
No risk flags
No Package No Dependents

About SDV

sdv-dev/SDV

Synthetic data generation for tabular data

Supports multiple synthesis architectures including statistical methods (GaussianCopula) and deep learning approaches (CTGAN) for single, multi-table, and sequential datasets. Includes built-in evaluation metrics comparing synthetic to real data across column distributions and correlations, plus constraint enforcement and PII anonymization during generation.

About synthetic-data-generator

hitsz-ids/synthetic-data-generator

SDG is a specialized framework designed to generate high-quality structured tabular data.

Leverages multiple synthesis approaches including GAN-based models (CTGAN, GaussianCopula), LLM-based generation for zero-shot synthesis, and statistical methods, with automatic column relationship detection to improve data quality. Features a modular Data Processor system for type conversion and preprocessing (datetime, null handling) with pluggable architecture, plus optimized memory efficiency for billion-scale datasets. Integrates differential privacy and anonymization capabilities alongside metadata inference for both single and multi-table scenarios.

Related comparisons

Scores updated daily from GitHub, PyPI, and npm data. How scores work