Convolutional Neural Network Based Hand Gesture Recognition System



Strong hand gesture recognition has been essential in human-computer interaction for a very long time. Due of their intricacy and breadth, these gestures are difficult for many people to understand, which   hinders   communication   between those with and without speech impairments. There is a lot of active, practical research being done in the field of computer vision because of the current boom in deep learning. The objective of this study is to identify hand gestures by using a camera to quickly follow the region of interest (ROI), which in this case is the hand region, in the image range. In this project, the use of convolutional neural networks (CNNs) in an algorithm for real-time hand gesture recognition has been proposed. On a dataset of thirty-six hand gestures and 400 photos for each gesture, the suggested CNN is anticipated to achieve excellent accuracy.

In this project, the goal is to maintain system accuracy and speed while recognizing a collection of static and dynamic hand motions. The computer is controlled by recognizing gestures. A CNN-based classifier is constructed for hand shape recognition by applying transfer learning to a convolutional neural network that has already been trained on a large dataset. A method that uses 2D convolutional neural networks to learn and predict while extracting hand components from the image has been developed.  An efficient spatio-temporal data augmentation strategy to distort the input volumes of hand gestures is thus suggested to decrease potential over-fitting and enhance generalization of the gesture classifier. Existing spatial augmentation methods are also included in the augmented method.


Project Work

Recent recognition tasks have been successfully classified using deep convolutional neural networks.  It has been demonstrated that multi-column deep CNNs, which use several parallel networks, can increase recognition rates of single networks by up to 80% for a variety of image classification tasks.There has been a lot of research done on hand gesture recognition up to this point, but there has not been any good research done on sign language alphabets. A database of 240 photos for each of the twenty-four sign language symbols was produced by Singha et al. Their program was able to classify photos with a 97% accuracy, however the images could only have a limited number of static backdrop circumstances and gestures. Liao et al., on the other hand, used an Intel Real Sense RGB-Depth sensor in his project and used depth perception methods to separate the hand region from the background.

  • In this project, a dataset of 400 images of 36 hand gestures (27 new hand gestures and 9 older hand gestures that were used in the existing system) has been acquired using a webcam to evaluate the model.
  • Each image is a 50x50 pixels. Black and white skin pixels are created by removing color skin pixels from the color image by the method called Gaussian Blur.
  • Transfer learning is used here to train a CNN-based classifier for hand shape identification over a pretrained convolutional neural network that was previously trained on a sizable dataset. The pretrained model used here is VGG16.


Comparison Table

In this project, a live video camera records hand gesture. However, this presents a challenge when using the same software in various lighting scenarios. The RGB (Red, Green, Blue) image is transformed into an HSV (Hue, Saturation, Value) image for hand gesture identification. The next step is thresholding, which lowers background noise. Artificial data synthesis was used in real time to train static-based gestures. Since the testing set is much simpler to predict than the training set and no artificial data synthesis was used, the testing accuracy is greater than the training accuracy.




Data: Tested


CNN Model
















Resulting image


After randomizing the data collected from all five subject areas, the dataset was split 70:30 between the training set and the testing set. To train static-based gestures, artificial data synthesis was incorporated in real time. Because artificial data synthesis was not used for the testing set, which is significantly easier to predict than the training set, the testing accuracy is higher than the training accuracy.With the help of 2D convolutional neural networks, an efficient approach for recognizing dynamic hand gestures has been created. To prevent overfitting, the suggested classifier augments the data with spatiotemporal information. It has been proven through comprehensive examination that combining low- and high-resolution sub-networks significantly increases classification accuracy. Additionally, it is shown that the suggested data augmentation strategy is crucial for getting improved performance. Thus, the system we created is comparatively noble to the existing systems with a higher accuracy and added elements that will eliminate issues of overfitting as well as disturbing light scenarios.



I would like to thank my teachers, Prashant Pal, Shashank Kumar, Pawan Kumar Patel, and Saurabh Bansod, for their guidance and support in completing my project and research paper on "Convolutional Neural Network based Hand Gesture Recognition System". Their expertise and encouragement were invaluable in shaping my understanding of the topic. I am also grateful to my colleagues and peers who provided feedback and suggestions, improving the project's quality. Lastly, I thank the institutions and organizations that provided resources and infrastructure for the research. Thank you all for contributing to the successful completion of this project and publication of the research paper

[1] N. A. Ahmad, "A Globally Convergent Stochastic Pairwise Conjugate Gradient-Based Algorithm for Adaptive Filtering," in IEEE Signal Processing     Letters,     vol.     15,     pp.     914-917,     2008,     doi:


[2] S. Mitra and T. Acharya, "Gesture Recognition: A Survey," in IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and   Reviews),  vol.   37,  no.  3,  pp.  311-324,  May  2007,  doi:


[3]   N.  H.  Dardas  and  N.  D.  Georganas,  "Real-Time  Hand  Gesture Detection and Recognition Using Bag-of-Features and Support Vector Machine Techniques," in IEEE Transactions on Instrumentation and Measurement,  vol.  60,  no.  11,  pp.  3592-3607,  Nov.  2011,  doi: