showlab/Image2Paragraph
[Image 2 Text Para] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.
A multi-stage vision-language pipeline that generates detailed paragraphs by composing outputs from BLIP2 (image captions), GRIT (dense object descriptions), and Semantic Segment Anything (region classification), then synthesizing results via ChatGPT/GPT4. The architecture optimizes for low GPU memory (runs on 8GB in ~20 seconds) through selective device offloading and includes a Gradio interface for interactive use. Generated paragraphs improve image-text retrieval performance over raw images, suggesting the compressed textual representation captures semantically relevant information more effectively.
824 stars. No commits in the last 6 months.
Stars
824
Forks
56
Language
Python
License
Apache-2.0
Category
Last pushed
Apr 28, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/showlab/Image2Paragraph"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
sethkarten/pokechamp
Official repository of the spotlight ICML 2025 paper, PokeChamp: an Expert-level Minimax Language Agent.
Leo-Corporation/Passliss
Passliss is a web application that allows you to generate secure passwords, test the strength of...
liruifengv/we-drawing
AI画图。每天一句中国古诗词,生成 AI 图片。
ehsanghaffar/einbiogpt
An intelligent web application built with Next.js, Tailwind CSS, and OpenAI's GPT models. It...
IDouble/ChatGPT-Simple-Tutorial-Image-Text-Code-Generation
🖼️ A simple ChatGPT AI tutorial on how to generate images/text/code and its limitations 🤖