lucasnewman/nanospeech

A simple, hackable text-to-speech system in PyTorch and MLX

58
/ 100
Established

Implements end-to-end flow matching for joint text alignment and waveform generation without auxiliary models like forced aligners, with dual ~1,500-line implementations in PyTorch and MLX for experimental flexibility. The 82M parameter model trains efficiently on commodity hardware (H100 in days) using only public domain data, achieving 3-5x realtime inference on Apple Silicon and modern GPUs. Supports voice cloning from reference audio and integrates with WebDataset for scalable multi-GPU training via PyTorch Accelerate.

186 stars and 616 monthly downloads. No commits in the last 6 months. Available on PyPI.

Stale 6m
Maintenance 2 / 25
Adoption 16 / 25
Maturity 25 / 25
Community 15 / 25

How are scores calculated?

Stars

186

Forks

21

Language

Python

License

MIT

Last pushed

Aug 03, 2025

Monthly downloads

616

Commits (30d)

0

Dependencies

13

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/lucasnewman/nanospeech"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.