sanjifr3/Narrator
An image and video description generator using an CNN-RNN based architecture.
Integrates with Amazon Polly for text-to-speech synthesis and PySceneDetect for automatic video scene segmentation, enabling accessibility-focused description generation. Implements separate PyTorch models trained on COCO 2014 (image) and MSR-VTT (video) datasets using ResNet/VGG encoders paired with LSTM/GRU decoders, with optional beam search decoding. Deployable as both a Flask web service on AWS and a standalone Python API for flexible integration workflows.
No commits in the last 6 months.
Stars
25
Forks
2
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Jul 16, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/sanjifr3/Narrator"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
AlimTleuliyev/image-to-audio
Image Captioning and Text-to-Speech
sidphbot/visual-to-audio-aid-for-visually-impaired
A system to process visual input on timed frames to produce sensible audio aid in accordance...
Abhradipta/OCR-With-Read-Out-Loud-Using-Python
An Optical Character Recognition (OCR) System designed using Python to read the contents out loud.
ahmedgulabkhan/TEI2S
TEI2S is a project which is really helpful for the visually impaired, in a sense that it takes...
SARIT42/image-Annotation-Speech
Explaining the contents of an image in the form of speech through caption generation using...