Skip to content

Latest commit

 

History

History
185 lines (134 loc) · 7.68 KB

File metadata and controls

185 lines (134 loc) · 7.68 KB

Stars Forks GitHub contributors Language GitHub HitCount

Sign Language Interpreter using Deep Learning

A sign language interpreter using live video feed from the camera. The project was completed in 24 hours as part of HackUNT-19, the University of North Texas's annual Hackathon. You can view the project demo on YouTube.

Table of contents

General info

The theme at HACK UNT 19 was to use technology to improve accessibility by finding a creative solution to benefit the lives of those with a disability. We wanted to make it easy for 70 million deaf people across the world to be independent of translators for there daily communication needs, so we designed the app to work as a personal translator 24*7 for the deaf people.

Demo

Example screenshot

Example screenshot

Example screenshot

The entire demo of the project can be found on YouTube.

Screenshots

Example screenshot Example screenshot

Technologies and Tools

  • Python
  • TensorFlow
  • Keras
  • OpenCV

Setup

  • Use provided Makefile to setup the dependencies for the source file. Eg.

mingw-32.exe install

Process

  • Run set_hand_histogram.py to set the hand histogram for creating gestures.
  • Once you get a good histogram, save it in the code folder, or you can use the histogram created by us that can be found here.
  • By running create_gestures.py, your hand gestures are stored in a produced database and subsequently labelled with the use of OpenCV. Click on 'c' when you're ready to start or pause the recording of your hand signs. Also, the incrementing digits on the bottom-left hand corner of the window will provide an idea of how many .jpeg images are being recorded.
  • Add different variations to the captured gestures by flipping all the images with rotate_images.py.
  • Run load_images.py to split all the captured gestures into training, validation and test sets.
  • To view all the gestures, run display_gestures.py .
  • Train the model using Keras by running cnn_model_train.py.
  • Run final.py. This will open up the gesture recognition window which will use your webcam to interpret the trained American Sign Language gestures.

Code Examples

# Model Traiining using CNN

import numpy as np
import pickle
import cv2, os
from glob import glob
from keras import optimizers
from keras.api.models import Sequential
from keras.api.layers import Dense
from keras.api.layers import Dropout
from keras.api.layers import Flatten
from keras.api.layers import Conv2D
from keras.api.layers import MaxPooling2D
from keras.api.callbacks import ModelCheckpoint
from keras import backend as K
import keras as s
s.backend.set_image_data_format('channels_first')
from keras import utils as np_utils

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

def get_image_size():
	img = cv2.imread('gestures/1/1.jpg', 0)
	return img.shape

def get_num_of_classes():
	return len(glob('gestures/*'))

image_x, image_y = get_image_size()
print(f"{image_x}, {image_y}")

def cnn_model():
	num_of_classes = get_num_of_classes()
	model = Sequential()
	model.add(Conv2D(16, (2, 2), input_shape=(image_x, image_y, 1), activation='relu', padding='same'))
	model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'))

	model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
	model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'))

	model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
	model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'))

	model.add(Flatten())
	model.add(Dense(128, activation='relu'))
	model.add(Dropout(0.2))
	model.add(Dense(num_of_classes, activation='softmax'))

	sgd = optimizers.SGD(learning_rate=1e-2)
	model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

	filepath="cnn_model_keras2.keras"
	checkpoint1 = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
	callbacks_list = [checkpoint1]

	return model, callbacks_list

def train():
	with open("train_images", "rb") as f:
		train_images = np.array(pickle.load(f))
	with open("train_labels", "rb") as f:
		train_labels = np.array(pickle.load(f), dtype=np.int32)

	with open("val_images", "rb") as f:
		val_images = np.array(pickle.load(f))
	with open("val_labels", "rb") as f:
		val_labels = np.array(pickle.load(f), dtype=np.int32)

	train_images = np.reshape(train_images, (train_images.shape[0], image_x, image_y, 1))
	val_images = np.reshape(val_images, (val_images.shape[0], image_x, image_y, 1))
	train_labels = np_utils.to_categorical(train_labels, num_classes=get_num_of_classes())
	val_labels = np_utils.to_categorical(val_labels, num_classes=get_num_of_classes())

	model, callbacks_list = cnn_model()
	model.summary()
	model.fit(train_images, train_labels, validation_data=(val_images, val_labels), epochs=15, batch_size=500, callbacks=callbacks_list)

	scores = model.evaluate(val_images, val_labels, verbose=0)
	print("CNN Error: %.2f%%" % (100-scores[1]*100))
	model.save('cnn_model_keras2.h5')

train()
K.clear_session()

Features

Our model was able to predict the 44 characters in the ASL with a prediction accuracy >95%.

Features that can be added:

  • Deploy the project on cloud and create an API for using it.
  • Increase the vocabulary of our model
  • Incorporate feedback mechanism to make the model more robust
  • Add more sign languages

Status

Project is: finished. Our team was the winner of the UNT Hackaton 2019. You can find the our final submission post on devpost. If you would like us to implement the project end-to-end for you please book a session.

Contact

Created by me with my teammates Siddharth Oza, Ashish Sharma, and Manish Shukla.

If you loved what you read here and feel like we can collaborate to produce some exciting stuff, or if you just want to shoot a question, please feel free to connect with me on email, LinkedIn, or Twitter. My other projects can be found here.

GitHub Twitter