sanjifr3/Narrator

An image and video description generator using an CNN-RNN based architecture.

/ 100

Experimental

Integrates with Amazon Polly for text-to-speech synthesis and PySceneDetect for automatic video scene segmentation, enabling accessibility-focused description generation. Implements separate PyTorch models trained on COCO 2014 (image) and MSR-VTT (video) datasets using ResNet/VGG encoders paired with LSTM/GRU decoders, with optional beam search decoding. Deployable as both a Flask web service on AWS and a standalone Python API for flexible integration workflows.

No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 9 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

AlimTleuliyev/image-to-audio

Image Captioning and Text-to-Speech

sidphbot/visual-to-audio-aid-for-visually-impaired

A system to process visual input on timed frames to produce sensible audio aid in accordance...

Abhradipta/OCR-With-Read-Out-Loud-Using-Python

An Optical Character Recognition (OCR) System designed using Python to read the contents out loud.

ahmedgulabkhan/TEI2S

TEI2S is a project which is really helpful for the visually impaired, in a sense that it takes...

SARIT42/image-Annotation-Speech

Explaining the contents of an image in the form of speech through caption generation using...

Explore Voice AI Tools

All categories Trending Voice AI directory Insights