SDGym and synthetic-data-generator

One tool benchmarks synthetic data generation methods while the other is a specialized framework for generating high-quality structured tabular data, making them complementary where the framework could be one of the methods benchmarked by the other.

SDGym
79
Verified
synthetic-data-generator
62
Established
Maintenance 13/25
Adoption 18/25
Maturity 25/25
Community 23/25
Maintenance 13/25
Adoption 10/25
Maturity 16/25
Community 23/25
Stars: 301
Forks: 67
Downloads: 1,273
Commits (30d): 0
Language: Python
License:
Stars: 2,409
Forks: 385
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
No risk flags
No Package No Dependents

About SDGym

sdv-dev/SDGym

Benchmarking synthetic data generation methods.

About synthetic-data-generator

hitsz-ids/synthetic-data-generator

SDG is a specialized framework designed to generate high-quality structured tabular data.

Leverages multiple synthesis approaches including GAN-based models (CTGAN, GaussianCopula), LLM-based generation for zero-shot synthesis, and statistical methods, with automatic column relationship detection to improve data quality. Features a modular Data Processor system for type conversion and preprocessing (datetime, null handling) with pluggable architecture, plus optimized memory efficiency for billion-scale datasets. Integrates differential privacy and anonymization capabilities alongside metadata inference for both single and multi-table scenarios.

Scores updated daily from GitHub, PyPI, and npm data. How scores work