SDV and synthetic-data-generator
The two tools appear to be **competitors**, as both are designed to generate high-quality structured tabular synthetic data, with SDV having significantly higher adoption based on stars and monthly downloads.
About SDV
sdv-dev/SDV
Synthetic data generation for tabular data
Supports multiple synthesis architectures including statistical methods (GaussianCopula) and deep learning approaches (CTGAN) for single, multi-table, and sequential datasets. Includes built-in evaluation metrics comparing synthetic to real data across column distributions and correlations, plus constraint enforcement and PII anonymization during generation.
About synthetic-data-generator
hitsz-ids/synthetic-data-generator
SDG is a specialized framework designed to generate high-quality structured tabular data.
Leverages multiple synthesis approaches including GAN-based models (CTGAN, GaussianCopula), LLM-based generation for zero-shot synthesis, and statistical methods, with automatic column relationship detection to improve data quality. Features a modular Data Processor system for type conversion and preprocessing (datetime, null handling) with pluggable architecture, plus optimized memory efficiency for billion-scale datasets. Integrates differential privacy and anonymization capabilities alongside metadata inference for both single and multi-table scenarios.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work