ntkhoa95/multimodal-for-vision
Vision Framework: A modular multi-agent system for computer vision tasks, featuring natural language queries, intelligent task routing, and specialized agents for classification, detection, and more. Built with PyTorch and modern deep learning models.
No commits in the last 6 months.
Stars
7
Forks
1
Language
Python
License
MIT
Category
Last pushed
Nov 07, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/ntkhoa95/multimodal-for-vision"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
facebookresearch/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
open-mmlab/mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
friedrichor/Awesome-Multimodal-Papers
A curated list of awesome Multimodal studies.
KaiyangZhou/pytorch-vsumm-reinforce
Unsupervised video summarization with deep reinforcement learning (AAAI'18)
adambielski/siamese-triplet
Siamese and triplet networks with online pair/triplet mining in PyTorch