ttengwang/Caption-Anything

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything

43
/ 100
Emerging

Integrates Segment Anything for zero-shot object segmentation with BLIP/BLIP-2 visual encoders and LangChain for conversational VQA, enabling multi-modal control over caption generation through visual prompts (click coordinates, trajectories) and textual parameters (length, sentiment, factuality). Provides a Gradio interface with configurable model backends and supports paragraph-level captioning with batch processing across multiple objects in a single image.

1,774 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 17 / 25

How are scores calculated?

Stars

1,774

Forks

104

Language

Python

License

BSD-3-Clause

Last pushed

Aug 29, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/ttengwang/Caption-Anything"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.