SDV and synthetic-data-generator

The two tools appear to be **competitors**, as both are designed to generate high-quality structured tabular synthetic data, with SDV having significantly higher adoption based on stars and monthly downloads.

SDV

Verified

synthetic-data-generator

Established

Maintenance 23/25

Adoption 25/25

Maturity 25/25

Community 21/25

Maintenance 13/25

Adoption 10/25

Maturity 16/25

Community 23/25

Stars: 3,439

Forks: 417

Downloads: 150,480

Commits (30d): 36

Language: Python

License: —

Stars: 2,409

Forks: 385

Downloads: —

Commits (30d): 0

Language: Python

License: Apache-2.0

No risk flags

No Package No Dependents

About SDV

sdv-dev/SDV

Synthetic data generation for tabular data

Supports multiple synthesis architectures including statistical methods (GaussianCopula) and deep learning approaches (CTGAN) for single, multi-table, and sequential datasets. Includes built-in evaluation metrics comparing synthetic to real data across column distributions and correlations, plus constraint enforcement and PII anonymization during generation.

About synthetic-data-generator

hitsz-ids/synthetic-data-generator

SDG is a specialized framework designed to generate high-quality structured tabular data.

Leverages multiple synthesis approaches including GAN-based models (CTGAN, GaussianCopula), LLM-based generation for zero-shot synthesis, and statistical methods, with automatic column relationship detection to improve data quality. Features a modular Data Processor system for type conversion and preprocessing (datetime, null handling) with pluggable architecture, plus optimized memory efficiency for billion-scale datasets. Integrates differential privacy and anonymization capabilities alongside metadata inference for both single and multi-table scenarios.

Related comparisons

SDV and SDGym SDV and SDGym

Scores updated daily from GitHub, PyPI, and npm data. How scores work