ltguo19/VSUA-Captioning
Code for "Aligning Linguistic Words and Visual Semantic Units for Image Captioning", ACM MM 2019
Constructs image representations as structured graphs with Visual Semantic Units (objects, attributes, relationships) extracted from scene graphs and bottom-up attention features, then aligns these units with caption words during generation. Implements dual training stages: cross-entropy pretraining followed by reinforcement learning optimization using CIDEr rewards. Built on PyTorch with integrated geometry and semantic relationship graphs, leveraging pre-extracted bottom-up features and scene graph annotations from COCO dataset.
258 stars. No commits in the last 6 months.
Stars
258
Forks
24
Language
Python
License
MIT
Category
Last pushed
Oct 18, 2019
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/ltguo19/VSUA-Captioning"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ntrang086/image_captioning
generate captions for images using a CNN-RNN model that is trained on the Microsoft Common...
fregu856/CS224n_project
Neural Image Captioning in TensorFlow.
vacancy/SceneGraphParser
A python toolkit for parsing captions (in natural language) into scene graphs (as symbolic...
Abdelrhman-Yasser/video-content-description
Video content description model for generating descriptions for unconstrained videos
kozodoi/BMS_Molecular_Translation
Image-to-text translation of chemical molecule structures with deep learning (top-5% Kaggle solution)