xuyang-liu16/VGDiffZero
[ICASSP 2024] VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders
This project helps pinpoint specific objects within an image based on a descriptive text phrase, without needing to train a custom model. You provide an image and a text query (e.g., "the red car"), and it outputs the precise location of that object in the image. This is useful for researchers and practitioners working with image analysis, computer vision, and visual search who need to accurately identify and localize visual elements described by text.
No commits in the last 6 months.
Use this if you need to precisely locate objects in images using text descriptions, without the hassle of fine-tuning or training a new model.
Not ideal if your primary goal is generating new images from text or if you don't need highly specific object localization.
Stars
17
Forks
—
Language
Python
License
—
Category
Last pushed
Feb 11, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/xuyang-liu16/VGDiffZero"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
UCSC-VLAA/story-iter
[ICLR 2026] A Training-free Iterative Framework for Long Story Visualization
PaddlePaddle/PaddleMIX
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks,...
keivalya/mini-vla
a minimal, beginner-friendly VLA to show how robot policies can fuse images, text, and states to...
adobe-research/custom-diffusion
Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)
byliutao/1Prompt1Story
🔥ICLR 2025 (Spotlight) One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation...