An Advanced Image Caption Generator Using Deep Neural Networks for Automatic Description Generation and Contextual Understanding

Authors

  • S. Benitta Sherine Department of Computer Science Engineering, Dhaanish Ahmed College of Engineering, Chennai, Tamil Nadu, India
  • S. Ramesh Kumar Department of Computer Science Engineering, Dhaanish Ahmed College of Engineering, Chennai, Tamil Nadu, India
  • B. Vaidianathan Department of Electronics & Communication Engineering, Dhaanish Ahmed College of Engineering, Chennai, Tamil Nadu, India
  • M. Sree Rajeswari Department of Computer Science Engineering, Dhaanish Ahmed College of Engineering, Chennai, Tamil Nadu, India

Keywords:

Convolutional neural network, Recurrent neural network (RNN), Gated Recurrent Units (GRU), Long short-term memory

Abstract

In this research, we use a method for automatically generating image captions to conduct a thorough analysis of deep neural networks. When fed an image, the algorithm can spit out a descriptive text in English. Convolutional neural networks (CNNs), recurrent neural networks (RNNs), and phrase creation are the three parts of the method that we examine. The VGGNet achieves the highest BLEU score after substituting three cutting-edge architectures for the CNN component. As an additional recurrent layer, we suggest a MATLAB and C++ implementation of the Gated Recurrent Units (GRU), which is a simplified version of the original. Long short-term memory (LSTM) and the simplified GRU both produce similar outcomes. On the other hand, it can speed up training and conserve memory with a few tweaks. Lastly, we use Beam Search to produce a number of sentences. Based on the results of the experiments, the modified method is able to produce captions that are on par with state-of-the-art methods while using less training memory. However, it still encounters problems with managing inventory and logistics, which makes it difficult to guarantee efficient shipping and delivery processes or to prevent digital content piracy and copyright infringement.

Downloads

Published

2024-12-17

How to Cite

Sherine , S. B., Kumar , S. R., Vaidianathan, B., & Rajeswari , M. S. (2024). An Advanced Image Caption Generator Using Deep Neural Networks for Automatic Description Generation and Contextual Understanding. American Journal of Pediatric Medicine and Health Sciences (2993-2149), 2(12), 128–143. Retrieved from http://grnjournal.us/index.php/AJPMHS/article/view/6356