neosun100/Step-Audio-R1.1

Step-Audio-R1.1: The First Audio Language Model with Test-Time Compute Scaling. All-in-One Docker with vLLM + Web UI + API.

36
/ 100
Emerging

This tool helps you quickly process long audio recordings for various business needs. You feed it audio files up to 85 minutes long (or even longer, as it handles segmentation automatically), and it delivers transcriptions, summaries, translations, or deep insights into the content. This is designed for professionals like researchers, content analysts, or intelligence officers who need to extract actionable information from spoken content efficiently.

Use this if you need to rapidly process and understand large volumes of audio data, whether for detailed transcriptions, concise summaries, cross-language understanding, or in-depth content analysis.

Not ideal if you don't have access to powerful NVIDIA GPUs (at least 4 with 40GB VRAM each) or if your primary need is not large-scale audio processing.

audio-analysis transcription content-summarization speech-translation media-intelligence
No Package No Dependents
Maintenance 10 / 25
Adoption 3 / 25
Maturity 11 / 25
Community 12 / 25

How are scores calculated?

Stars

4

Forks

1

Language

Python

License

Apache-2.0

Last pushed

Jan 18, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/neosun100/Step-Audio-R1.1"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.