astorfi/lip-reading-deeplearning

:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures

/ 100

Emerging

Implements coupled 3D CNNs that jointly process temporal and spatial dimensions to learn multimodal feature representations for audio-visual correspondence matching. Includes preprocessing pipelines for lip tracking via dlib, mouth region extraction, and MFEC spectrogram generation from video/audio streams. Built with TensorFlow and designed for utterance-level features, enabling applications beyond lip-reading to cross-modal speaker verification and speech recognition in noisy conditions.

1,901 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 9 / 25

Community 24 / 25

How are scores calculated?

Stars

1,901

Forks

333

Language

Python

License

Apache-2.0

Related frameworks

deepconvolution/LipNet

Automated Lip reading from real-time videos in tensorflow in python

d-kavinraja/MouthMap

MouthMap is a deep learning-based lip reading system that converts silent video sequences into...

articulateinstruments/DeepLabCut-for-Speech-Production

Trained deep neural-net models for estimating articulatory keypoints from midsagittal ultrasound...

ZakirCodeArchitect/Sonic-Lipsync-AI

A Google Colab-based Gradio app for generating lip-synced videos using the Sonic model. It...

Cl0ud-9/Lip-Sync-Video-Generator

An AI-powered pipeline that transforms text into realistic lip-synced talking face videos using...

Explore ML Frameworks

All categories Trending ML Framework directory Insights