linto-stt and linto-studio
The API is a backend component providing automatic speech recognition, while the studio is a frontend interface for transcribing and annotating audio/video, making them complementary tools within an ecosystem.
About linto-stt
linto-ai/linto-stt
An automatic speech recognition API
Supports multiple interchangeable STT engines (NeMo, Whisper, Kaldi, Kyutai) deployed across three operational modes—HTTP for batch file processing, WebSocket for real-time streaming, and Celery task queues for async microservices architectures. Built with pluggable engine architecture and optional post-processing via recasepunc models for punctuation/capitalization on untrained outputs. Containerized with single Dockerfile parametrization and GPU acceleration support for compute-intensive backends.
About linto-studio
linto-ai/linto-studio
Transcription and annotation interface for recorded audio or video files
Builds on a microservices architecture integrating LinTO's transcription engine with speaker diarization, automatic timestamp alignment, and NLP-powered features like closed-caption editing. Includes a companion mobile app for on-the-go recording with media synchronization, and deploys via Docker with optional SMTP for sharing/auth—full functionality requires LinTO's service stack accessible through an API gateway.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work