This project demonstrates the use of transfer learning using an ensemble of three residual nets and one Inception-v4 of neural networks to classify a set of hand gestures showing the numbers - 1 to 10, in various languages. The architecture used is demonstrated in this paper - Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning.
Along with classification we also use a single object detection deep neural net whose architecture is described in this paper - SSD: Single Shot MultiBox detector. It is capable of detection different classes of objects, even though they occur multiple times each in each image.
We utilize Tensorflow for both the object detection and classification tasks. The training data consisted of a set of images showcasing hand gestures for number signs with the bounding box for the hands(some images have one hands and the other images have two hands) in PASCAL VOC format for object detection. For classification the images were also annotated in these categories:
- Zero
- One
- Two
- Chinese three
- US three
- UK three
- Four
- Five
- Chinese six
- Chinese seven
- Chinese eight
- Chinese nine
We utilize Python with a Flask based backend for the web application which loads these neural net models into RAM and presents a user interface for classifying and localizing a hand signal image.
The interface of the web app and the results appear as below: