[1] |
Farhadi A, Hejrati M,Amin M S,et al. Every picture tells a story:Generating sentences from images[C]∥Proc of the 11th European Conference on Computer Vision,2010:15-29.
|
[2] |
Liu M F, Li L J, Hu H J, et al. Image caption generation with dual attention mechanism [J].Information Processing & Management,2020,57(2):102178.
|
[3] |
Vaswani A,Shazeer N,Parmar N,et al. Attention is all you need[C]∥Proc of
|
|
the 31st International Conference on Neural Information Processing Systems, 2017:6000-6010.
|
[4] |
Kulkarni G,Premraj V,Ordonez V,et al. Babytalk:Understanding and generating simple image descriptions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(12):2891-2903.
|
[5] |
Vinyals O,Toshev A,Bengio S,et al. Show and tell:A neural image caption generator[C]∥Proc of the 2015 IEEE Confe- rence on Computer Vision and Pattern Recognition,2015:3156-3164.
|
[6] |
Karpathy A, Li F F. Deep visual-semantic alignments for generating image descriptions[C]∥Proc of the 2015 IEEE Conference on Computer Vision and Pattern Recognition,2015:3128-3137.
|
[7] |
Mao J H, Xu W, Yang Y,et al. Deep captioning with multimodal recurrent neural networks (m-RNN)[J]. arXiv:1412.6632,2014.
|
[8] |
Jia X,Gavves E,Fernando B,et al. Guiding the long-short term memory model for image caption generation[C]∥Proc of the 2015 IEEE International Conference on Computer Vision,2015:2407-2415.
|
[9] |
Wu Q, Shen C H,Liu L Q,et al. What value do explicit high level concepts have in vision to language problems?[C]∥Proc of the 2016 IEEE Conference on Computer Vision and Pattern Recognition,2016:203-212.
|
[10] |
Xu K,Ba J,Kiros R,et al. Show, attend and tell:Neural image caption generation with visual attention[C]∥Proc of the 32nd International Conference on Machine Learning,2015:2048-2057.
|
[11] |
Lu J S,Xiong C M,Parikh D,et al. Knowing when to look:Adaptive attention via a visual sentinel for image captioning[C]∥Proc of the 2017 IEEE Conference on Computer Vision and Pattern Recognition,2017:375-383.
|
[12] |
Chen L, Zhang H W, Xiao J,et al. SCA-CNN:Spatial and channel-wise attention in convolutional networks for image captioning[C]∥Proc of the 2017 IEEE Conference on Computer Vision and Pattern Recognition,2017:5659-5667.
|
[13] |
Li X R, Lan W Y,Dong J F,et al. Adding Chinese captions to images[C]∥Proc of the 2016 ACM International Confe- rence on Multimedia Retrieval,2016:271-275.
|
[14] |
Szegedy C,Liu W,Jia Y Q,et al. Going deeper with convolutions[C]∥Proc of the 32nd International Conference on Machine Learning,2015:1-9.
|
[15] |
Rennie S J,Marcheret E,Mroueh Y,et al. Self-critical sequence training for image captioning[C]∥Proc of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017:7008-7024.
|
[16] |
Anderson P,He X D,Buehler C,et al. Bottom-up and top-down attention for image captioning and visual question answering[C]∥Proc of the 2018 IEEE Conference on Computer Vision and Pattern Recognition,2018:6077-6086.
|
[17] |
Zhang Y L, Tian Y P, Kong Y,et al. Residual dense network for image super-resolution[J].arXiv:1802.08797,2018.
|
[18] |
Shen Y Y,Tan X, He D,et al. Dense information flow for neural machine translation[C] ∥Proc of the 2018 Confe- rence of the North American Chapter of the Association for Computational Linguistics,2018:1294-1303.
|
[19] |
Lin T Y,Maire M,Belongie S,et al. Microsoft COCO:Common objects in context[C]∥Proc of European Conference on Computer Vision,2014:740-755.
|
[20] |
Papineni K,Roukos S,Ward T,et al. BLEU:A method for automatic evaluation of machine translation[C]∥Proc of the 40th Annual Meeting of the Association for Computational Linguistics, 2002:311-318.
|
[21] |
Denkowski M,Lavi A. Meteor universal:Language specific
|
|
translation evaluation for any target language[C]∥Proc of the 9th Workshop on Statistical Machine Translation,2014:376-380.
|
[22] |
Lin C Y. ROUGE:A package for automatic evaluation of summaries
|
[C] |
∥Proc of the Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004, 2004:1-10.
|
[23] |
Vedantam R, Zitnick C L, Parikh D, et al. CIDEr: Consensus-based image description evaluation[C]∥Proc of 2015 the IEEE Conference on Computer Vision and Pattern Recognition,2015:4566-4575.
|
[24] |
Anderson P,Fernando B,Johnson M,et al. SPICE:Semantic propositional image caption evaluation[C]∥Proc of European Confe- rence on Computer Vision,2016:382-398.
|
[25] |
He K M, Zhang X Y,Ren S Q,et al. Deep residual learning for image recognition[C]∥Proc of the 2016 IEEE Confe- rence on Computer Vision and Pattern Recognition,2016:770-778.
|
[26] |
Russakovsky O,Deng J,Su H,et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision,2015,115(3):211-252.
|
[27] |
Ren S, He K, Girshick R, et al. Faster R-CNN:Towards real-time object detection with region proposal networks
|
[J] |
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017,39(6):1137-1149.
|
[28] |
Kingma D P,Ba J. Adam:A method for stochastic optimization[J]. arXiv:1412.6980,2014.
|
[29] |
Wiseman S, Rush A M. Sequence-to-sequence learning as beam-search optimization[J]. arXiv:1606. 02960,2016.
|
[30] |
Ioffe S,Szegedy C. Batch normalization:Accelerating deep network training by reducing internal covariate shift[C]∥Proc of the 32nd International Conference on Machine Learning,2015:448-456.
|