image captioning
15 Jan 2019For this winter, I will be working on image captioning.
To do list
- Paper 1
- Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
- Motivation
- While top-down visual attention mechanisms are used widely in image cationing and VQA, bottom-up mechanisms were not that famous.
- The bottom-up mechanism (based on Faster R-CNN) will be able to provide more various image regions.
- Paper 2
- Recurrent fusion network for image captioning
- Motivation
- In image captioning, encoder-decoder frameworks are being used widely.
- Existing frameworks only use one kind of CNNs. This limits the performance of whole framework to the performance of the base CNN.
- Reference source code
- Code
- From ./model/Attmodel.py, check how topdown core class and attention class work.