ttengwang/Caption-Anything
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything
Integrates Segment Anything for zero-shot object segmentation with BLIP/BLIP-2 visual encoders and LangChain for conversational VQA, enabling multi-modal control over caption generation through visual prompts (click coordinates, trajectories) and textual parameters (length, sentiment, factuality). Provides a Gradio interface with configurable model backends and supports paragraph-level captioning with batch processing across multiple objects in a single image.
1,774 stars. No commits in the last 6 months.
Stars
1,774
Forks
104
Language
Python
License
BSD-3-Clause
Category
Last pushed
Aug 29, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/ttengwang/Caption-Anything"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
tychenjiajun/exif-ai
A Node.js CLI and library that uses OpenAI, Ollama, ZhipuAI, Google Gemini or Coze to write...
FennelFetish/qapyq
An image viewer and AI-assisted editing/captioning/masking tool that helps with curating...
Kuberwastaken/meow
The most Purr-fect Image File Format for your AI workflows
DavidMChan/caption-by-committee
Using LLMs and pre-trained caption models for super-human performance on image captioning.
blib-la/captain
Give your computer an AI Brain