An Advanced Image Caption Generator Using Deep Neural Networks for Automatic Description Generation and Contextual Understanding
Keywords:
Convolutional neural network, Recurrent neural network (RNN), Gated Recurrent Units (GRU), Long short-term memoryAbstract
In this research, we use a method for automatically generating image captions to conduct a thorough analysis of deep neural networks. When fed an image, the algorithm can spit out a descriptive text in English. Convolutional neural networks (CNNs), recurrent neural networks (RNNs), and phrase creation are the three parts of the method that we examine. The VGGNet achieves the highest BLEU score after substituting three cutting-edge architectures for the CNN component. As an additional recurrent layer, we suggest a MATLAB and C++ implementation of the Gated Recurrent Units (GRU), which is a simplified version of the original. Long short-term memory (LSTM) and the simplified GRU both produce similar outcomes. On the other hand, it can speed up training and conserve memory with a few tweaks. Lastly, we use Beam Search to produce a number of sentences. Based on the results of the experiments, the modified method is able to produce captions that are on par with state-of-the-art methods while using less training memory. However, it still encounters problems with managing inventory and logistics, which makes it difficult to guarantee efficient shipping and delivery processes or to prevent digital content piracy and copyright infringement.