notes.txt

Comp vision notes:
1. dlib + OpenCV allows for face classification + feature classification. See
https://docs.opencv.org/3.1.0/d7/d8b/tutorial_py_face_detection.html#gsc.tab=0
for a project that does eye detection. Can then measure angle + size of eyes to determine what to do with glasses

2. Alternatively, find face classification algorithm that measures head tilt some other way, perform rotation on the image
to frontalize it, then do Haar-like rectangles to find eyes. Scale glasses and place them, and then rotate glasses and image
back to original orientation


Haar cascade vs deepnet solution:
 -- for detecting eyes and faces, will want to test differences between haar cascade and deepnet


 Notes for initializing AWS instance and connecting to it:
  - For connecting: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html
  - The command to connect: ssh -i ryan-computer-vision-key.pem ubuntu@ec2-52-55-242-160.compute-1.amazonaws.com
  - To do SCP: scp -i ryan-computer-vision-key.pem file_here ubuntu@ec2-52-55-242-160.compute-1.amazonaws.com:destination_here

Questions to ask:
 - Would it be reasonable to compare a Haar Cascade classifier vs CNN classifier on faces and eyes and see which one does better?
 - Followup: What's the standard way to measure accuracy in this case? Would it be a simple testing accuracy on face vs. non-face images?
   How about testing on images with multiple faces on it? (or should i not do this)
 - Should i be adhering to certain image dimensions when training the classifier? I've seen make negative images 100x100 and faces 50x50,
   but unsure how reliable/important this is
 - 

 Manually place eyes on set of like 20 images, don't use images in training set
 Test each classifier on the perfect images, create some sort of distance function (distance from centers)
 (intersection/union of bounding boxes -- IOU)


Game plan:
 - Use both the Haar cascade and CNN tensorflow out-of-box solutions to find eyes. Test which one is more accurate (somehow)
   - while it's accepted that CNNs are usually better nowadays, are their bounding boxes more accurate, or just the rate at which they recognize?
   - In review: mention that generally known that CNNs are more accurate for object detection, but question the accuracy of the detection? Can a Haar Cascade detector locate eyes more accurately than a CNN?
 - Take the better method and use it to place the glasses. Measure this against a small set of 20 images that are perfectly labelled.

Glasses:
 Regular: Width 600px, height 205px. 76px from top of image to bottom of nose frame
 Sunglasses: Width 600px, height 209px


Interesting assumption: head always taller than wider?

Bounding boxes on ellipse:
https://stackoverflow.com/questions/87734/how-do-you-calculate-the-axis-aligned-bounding-box-of-an-ellipse

REferences:
CNN face detection (using FaceNet): http://jekel.me/2017/How-to-detect-faces-using-facenet/
Haar cascade face + eyes detection: opencv tutorial, 
comparision of CNN vs Haar: https://dzone.com/articles/cnn-vs-cascade-classifiers-for-object-detection

Face database: FDDB: Face Detection Data Set and Benchmark at http://vis-www.cs.umass.edu/fddb/
Uses Faces in the Wild for faces, then provides annotations

in fold 2, 285 images with total of 519 faces
first image in fold 2:
2002/07/28/big/img_416
[(53.8011831203448, 21.353383418584098, 88.95143375931039, 121.82969316283182), (285.3230637867992, 10.930149311060417, 71.56117442640164, 105.08978737787916), (346.7922990576719, 79.57017142820564, 36.134583884656195, 47.90292114358871)]

FaceNet: https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Schroff_FaceNet_A_Unified_2015_CVPR_paper.pdf
dlib: sliding window histogram-of-oriented-gradients based object detector.
 http://dlib.net/face_detection_ex.cpp.html <---- explains that upsampling resizes image by 2x. Dlib library looks for faces that are about 80x80 or larger. So upsample once to detect 40x40 faces too

****************************************************
FACE DETECTION RESULTS:
The standard used for whether or not a face was correctly detected was that the center of the detected face
must be within 40% of the height and width of the labeled face to true center. Additionally, the width and height of
bounding box of detected face must have been > .5 and < 1.5 times the width and height of true bounding box. These
parameters were chosen after experimenting with many values. Making ranges too wide allowed for non-correctly detected
faces to be accidentally detected (false positives), while making ranges too small missed detections that were correct.

Face detection accuracies were reported with these ranges used.

For Haar cascade detector:
explaining params: http://www.bogotobogo.com/python/OpenCV_Python/python_opencv3_Image_Object_Detection_Face_Detection_Haar_Cascade_Classifiers.php

scale factor values: (time is hh:mm:ss.ms)

1.3 and 5, 1329/2067 faces, 25 false, 0.6429608127721336, 0:00:21.676171
1.2 and 5, 1427/2067 faces, 58, 0.6903725205611998, 0:00:30.377876
1.1 and 5, 1544/2067 faces, 155, 0.7469762941461054, 0:00:48.634285
1.05 and 5, 1627/2067 faces, 401, 0.7871311078858249, 0:01:29.261559
1.01 and 5, 1815/2067 faces, 1987, 0.8780841799709724, 0:06:51.084492

1.3 and 4, 1398/2067 faces, 43, 0.6763425253991292, 0:00:25.266321
1.2 and 4, 1479/2067 faces, 95, 0.7155297532656023, 0:00:30.429258
1.1 and 4, 1580/2067 faces, 213, 0.764392839864538, 0:01:07.422060
1.05 and 4, 1650/2067 faces, 527, 0.7982583454281568, 0:01:41.136731
1.01 and 4, 1848/2067 faces, 2287, 0.8940493468795355, 0:08:23.776399

1.3 and 3, 1447/2067 faces, 74, 0.7000483792936623, 0:00:22.047011
1.2 and 3, 1519/2067 faces, 172, 0.7348814707305273, 0:00:29.524132
1.1 and 3, 1610/2067 faces, 324, 0.7789066279632317, 0:00:49.368626
1.05 and 3, 1686/2067 faces, 718, 0.8156748911465893, 0:01:29.713371
1.01 and 3, 1886/2067 faces, 2736, 0.9124334784712144, 0:06:57.774940

1.3 and 2, 1503/2067 faces, 159, 0.7271407837445574, 0:00:22.130912
1.2 and 2, 1560/2067 faces, 291, 0.7547169811320755, 0:00:38.926712
1.1 and 2, 1657/2067 faces, 531, 0.8016448959845186, 0:00:51.998133
1.05 and 2, 1721/2067 faces, 1053, 0.8326076439283987, 0:01:29.194843
1.01 and 2, 1932/2067 faces, 3432, 0.9346879535558781, 0:08:04.406453

CNN detector: (steps were 0.6, 0.7, 0.7)
factor .800: 1629/2067 faces, 22, 0.7880986937590712, 0:16:34.575368
factor .750: 1649/2067 faces, 19, 0.7977745524915336, 0:16:39.013508
factor .709: 1608/2067 faces, 22, 0.7779390420899854, 0:16:10.188261
factor .650: 1585/2067 faces, 19, 0.7668118045476536, 0:16:07.263717

CNN detector: (steps now 0.5, 0.6, 0.7)
factor .800: 1704/2067, 38, 0.8243831640058055, 0:17:22.699501 
factor .750: 1688/2067, 35, 0.8166424770198355, 0:16:42.087812
factor .709: 


HOG detector:
upsample 0: 1478/2067, 5, 0.7150459603289792, 0:00:26.807566
upsample 1: 1603/2067, 9, 0.7755200774068699, 0:01:32.913658
upsample 2: 1615/2067, 13, 0.7813255926463474, 0:06:16.461375

avg height: 142.58539351061276
avg width: 94.11600875170973
over 5171 faces

http://vis-www.cs.umass.edu/fddb/fddb.pdf


On manual:

Haar 1.1, 5:
found 24 out of 24 faces in 
accuracy: 1.0
found 24/24 faces
total false pos: 2
accuracy: 1.0
Time elapsed (hh:mm:ss.ms) 0:00:00.872882

Hog 1 upsample:
found 24 out of 24 faces in 
accuracy: 1.0
found 24/24 faces
total false pos: 0
accuracy: 1.0
Time elapsed (hh:mm:ss.ms) 0:00:01.711387

CNN .800 scale, .6,.7,.7:
found 23 out of 24 faces in 
accuracy: 0.9583333333333334
found 23/24 faces
total false pos: 0
accuracy: 0.9583333333333334
Time elapsed (hh:mm:ss.ms) 0:00:39.536741