salesforce/LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

/ 100

Emerging

Provides unified interfaces for vision-language tasks (captioning, VQA, retrieval, dialogue) with modular pre-trained components like BLIP-2 and InstructBLIP that leverage frozen LLMs and efficient Q-Former bridging layers. Integrates curated multimodal datasets and supports zero-shot instruction-tuning across image, video, and audio modalities without extensive task-specific customization.

11,183 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

11,183

Forks

1,101

Language

Jupyter Notebook

License

BSD-3-Clause

Higher-rated alternatives

rom1504/img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M...

devrimcavusoglu/pybboxes

Light weight toolkit for bounding boxes providing conversion between bounding box types and...

PyRetri/PyRetri

Open source deep learning based unsupervised image retrieval toolbox built on PyTorch🔥

Particle1904/DatasetHelpers

Dataset Helper program to automatically select, re scale and tag Datasets (composed of image and...

haltakov/natural-language-image-search

Search photos on Unsplash using natural language

Explore ML Frameworks

All categories Trending ML Framework directory Insights