首页 /研究 /A Study on Different Deep Learning Architectures on Image Captioning
PERCEPTION

A Study on Different Deep Learning Architectures on Image Captioning

Kiranmai Rage

发表年份
2022
引用次数
2

摘要

In the recent era Deep learning is showing exemplary results in many application areas. Researches from different disciplines have incorporated deep learning in to their research to solve different interdisciplinary problems. Deep learning applications areas include Speech recognition, Natural Language Processing, Computer vision, Networking, Healthcare, IoT, Robotics, Agriculture, Remote sensing and many other areas. Thus, this paper contributes a review on various deep learning approaches which include Convolution Neural Network (CNN), Deep Neural Network (DNN), Auto-Encoder (AE), Recurrent Neural Network (RNN) enclosed with Gated Recurrent Units (GRU), Long Short-Term Memory (LSTM) and ConvLSTM, Deep Reinforcement Learning (DRL), Generative based Adversarial Network (GAN) and Deep Belief Network (DBN). For extraction of the most important features of an image various data extraction procedures have designed. CNN's tremendous learning capacity is based on the utilisation of several feature extraction stages that continuously learn from data. Some of the popular CNN architectures name as LeNet, AlexNet, ZFNet / Clarifai, VGGNET, GoogLeNet, ResNet, DenseNet, FractalNet and CapsuleNet. Recent works extended to a combination of two CNN models like Inception, ResNetV2 to attained different existing approaches consequences in deep learning. Identification of the significant objects, their parameters and relationships of an object images are required for image captioning. Syntactically and semantically correct sentences need to be generated. The intricacies and problems of picture captioning may be handled using deep learning approaches. A comparison between these models was also presented. We also discussed about different standard datasets which are utilized for executing and estimating the deep learning approaches.

关键词

Closed captioningComputer scienceImage (mathematics)Artificial intelligenceDeep learningComputer visionNatural language processingMultimedia

相关论文

查看 PERCEPTION 分类全部论文