SDGym and synthetic-data-generator
One tool benchmarks synthetic data generation methods while the other is a specialized framework for generating high-quality structured tabular data, making them complementary where the framework could be one of the methods benchmarked by the other.
About SDGym
sdv-dev/SDGym
Benchmarking synthetic data generation methods.
About synthetic-data-generator
hitsz-ids/synthetic-data-generator
SDG is a specialized framework designed to generate high-quality structured tabular data.
Leverages multiple synthesis approaches including GAN-based models (CTGAN, GaussianCopula), LLM-based generation for zero-shot synthesis, and statistical methods, with automatic column relationship detection to improve data quality. Features a modular Data Processor system for type conversion and preprocessing (datetime, null handling) with pluggable architecture, plus optimized memory efficiency for billion-scale datasets. Integrates differential privacy and anonymization capabilities alongside metadata inference for both single and multi-table scenarios.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work