visual question answering

ml
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
- basic ideas about coattention, and how humans check the center of the image and center of the text
- VQA does it separately, does not actually show how much it is