astorfi/lip-reading-deeplearning
:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures
Implements coupled 3D CNNs that jointly process temporal and spatial dimensions to learn multimodal feature representations for audio-visual correspondence matching. Includes preprocessing pipelines for lip tracking via dlib, mouth region extraction, and MFEC spectrogram generation from video/audio streams. Built with TensorFlow and designed for utterance-level features, enabling applications beyond lip-reading to cross-modal speaker verification and speech recognition in noisy conditions.
1,901 stars. No commits in the last 6 months.
Stars
1,901
Forks
333
Language
Python
License
Apache-2.0
Category
Last pushed
Nov 07, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/astorfi/lip-reading-deeplearning"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
deepconvolution/LipNet
Automated Lip reading from real-time videos in tensorflow in python
d-kavinraja/MouthMap
MouthMap is a deep learning-based lip reading system that converts silent video sequences into...
articulateinstruments/DeepLabCut-for-Speech-Production
Trained deep neural-net models for estimating articulatory keypoints from midsagittal ultrasound...
ZakirCodeArchitect/Sonic-Lipsync-AI
A Google Colab-based Gradio app for generating lip-synced videos using the Sonic model. It...
Cl0ud-9/Lip-Sync-Video-Generator
An AI-powered pipeline that transforms text into realistic lip-synced talking face videos using...