Dense Captioning

Year: 2,016
Authors: Justin Johnson, Andrej Karpathy, Li Fei-Fei
Journal:  IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Programming languages: CSS, HTML, JavaScript, Jupyter Notebook, Lua, Python

We introduce the dense captioning task, which requires a computer vision system to both localize and describe salient regions in images in natural language. The dense captioning task generalizes object detection when the descriptions consist of a single word, and Image Captioning when one predicted region covers the full image. To address the localization and description task jointly we propose a Fully Convolutional Localization Network (FCLN) architecture that processes an image with a single, efficient forward pass, requires no external regions proposals, and can be trained end-to-end with a single round of optimization. The architecture is composed of a Convolutional Network, a novel dense localization layer, and Recurrent Neural Network language model that generates the label sequences

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.