xuyang-liu16/VGDiffZero

[ICASSP 2024] VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders

/ 100

Experimental

This project helps pinpoint specific objects within an image based on a descriptive text phrase, without needing to train a custom model. You provide an image and a text query (e.g., "the red car"), and it outputs the precise location of that object in the image. This is useful for researchers and practitioners working with image analysis, computer vision, and visual search who need to accurately identify and localize visual elements described by text.

No commits in the last 6 months.

Use this if you need to precisely locate objects in images using text descriptions, without the hassle of fine-tuning or training a new model.

Not ideal if your primary goal is generating new images from text or if you don't need highly specific object localization.

image-analysis visual-search object-localization computer-vision content-tagging

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

—

Higher-rated alternatives

UCSC-VLAA/story-iter

[ICLR 2026] A Training-free Iterative Framework for Long Story Visualization

PaddlePaddle/PaddleMIX

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks,...

keivalya/mini-vla

a minimal, beginner-friendly VLA to show how robot policies can fuse images, text, and states to...

adobe-research/custom-diffusion

Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)

byliutao/1Prompt1Story

🔥ICLR 2025 (Spotlight) One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation...

Explore Diffusion Models

All categories Trending Diffusion directory Insights