lyuchenyang/Efficient-VideoQA
Code for ACL SustaiNLP 2023 paper "Is a Video worth n × n Images? A Highly Efficient Approach to Transformer-based Video Question Answering"
No commits in the last 6 months.
Stars
2
Forks
—
Language
Python
License
Apache-2.0
Category
Last pushed
Jul 04, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/lyuchenyang/Efficient-VideoQA"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
facebookresearch/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
kyegomez/PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
kyegomez/RT-X
Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment:...
chuanyangjin/MMToM-QA
[🏆Outstanding Paper Award at ACL 2024] MMToM-QA: Multimodal Theory of Mind Question Answering
kyegomez/PALM-E
Implementation of "PaLM-E: An Embodied Multimodal Language Model"