yousefkotp/Visual-Question-Answering
A Light weight deep learning model with with a web application to answer image-based questions with a non-generative approach for the VizWiz grand challenge 2023 by carefully curating the answer vocabulary and adding linear layer on top of Open AI's CLIP model as image and text encoder
No commits in the last 6 months.
Stars
14
Forks
7
Language
Jupyter Notebook
License
—
Category
Last pushed
Jun 27, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/yousefkotp/Visual-Question-Answering"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
facebookresearch/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
open-mmlab/mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
friedrichor/Awesome-Multimodal-Papers
A curated list of awesome Multimodal studies.
adambielski/siamese-triplet
Siamese and triplet networks with online pair/triplet mining in PyTorch
HuaizhengZhang/Awsome-Deep-Learning-for-Video-Analysis
Papers, code and datasets about deep learning and multi-modal learning for video analysis