Image-to-Speech-GenAI-Tool-Using-LLM and image-to-text-to-speech
These are ecosystem siblings—both implement the identical pipeline (image → LLM-generated text → speech synthesis) using the same technology stack (Hugging Face, OpenAI, and LangChain), differing only in implementation details and user interface rather than core functionality.
About Image-to-Speech-GenAI-Tool-Using-LLM
GURPREETKAURJETHRA/Image-to-Speech-GenAI-Tool-Using-LLM
AI tool that generates an Audio short story based on the context of an uploaded image by prompting a GenAI LLM model, Hugging Face AI models together with OpenAI & LangChain
Implements a three-stage pipeline: Salesforce's BLIP image-captioning model extracts visual context, OpenAI's GPT-3.5-turbo crafts narrative prompts via LangChain, and ESPnet's VITS text-to-speech model generates audio output. Built with Streamlit for local deployment and published on both Streamlit Cloud and Hugging Face Spaces, supporting direct API token configuration via environment variables.
About image-to-text-to-speech
semaj87/image-to-text-to-speech
An app that uses Hugging Face AI models together with OpenAI & LangChain, to generate text from an image, which then generates audio from the text
Scores updated daily from GitHub, PyPI, and npm data. How scores work