image captioning

15 Jan 2019

For this winter, I will be working on image captioning.

Paper 1
- Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
- Motivation
  - While top-down visual attention mechanisms are used widely in image cationing and VQA, bottom-up mechanisms were not that famous.
  - The bottom-up mechanism (based on Faster R-CNN) will be able to provide more various image regions.
Paper 2
- Recurrent fusion network for image captioning
- Motivation
  - In image captioning, encoder-decoder frameworks are being used widely.
  - Existing frameworks only use one kind of CNNs. This limits the performance of whole framework to the performance of the base CNN.
Reference source code
- Code
- From ./model/Attmodel.py, check how topdown core class and attention class work.

Geonhwa Jeong