apluka34/Bud500

Bud500: A Comprehensive Vietnamese ASR Dataset

44
/ 100
Emerging

Spans 500 hours of multi-regional Vietnamese speech across diverse topics (podcasts, travel, food) with 16kHz sampling rate, structured as 634K training samples paired with transcriptions. Hosted on Hugging Face Datasets with parquet-based distribution supporting both streaming and batch loading via the `datasets` library. Curated by VietAI to provide regional accent diversity and publicly sourced material for reproducible ASR research.

No Package No Dependents
Maintenance 6 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 14 / 25

How are scores calculated?

Stars

69

Forks

9

Language

License

Apache-2.0

Last pushed

Oct 10, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/apluka34/Bud500"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.