-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathnotes.txt
160 lines (117 loc) · 7.47 KB
/
notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
Comp vision notes:
1. dlib + OpenCV allows for face classification + feature classification. See
https://docs.opencv.org/3.1.0/d7/d8b/tutorial_py_face_detection.html#gsc.tab=0
for a project that does eye detection. Can then measure angle + size of eyes to determine what to do with glasses
2. Alternatively, find face classification algorithm that measures head tilt some other way, perform rotation on the image
to frontalize it, then do Haar-like rectangles to find eyes. Scale glasses and place them, and then rotate glasses and image
back to original orientation
Haar cascade vs deepnet solution:
-- for detecting eyes and faces, will want to test differences between haar cascade and deepnet
Notes for initializing AWS instance and connecting to it:
- For connecting: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html
- The command to connect: ssh -i ryan-computer-vision-key.pem [email protected]
- To do SCP: scp -i ryan-computer-vision-key.pem file_here [email protected]:destination_here
Questions to ask:
- Would it be reasonable to compare a Haar Cascade classifier vs CNN classifier on faces and eyes and see which one does better?
- Followup: What's the standard way to measure accuracy in this case? Would it be a simple testing accuracy on face vs. non-face images?
How about testing on images with multiple faces on it? (or should i not do this)
- Should i be adhering to certain image dimensions when training the classifier? I've seen make negative images 100x100 and faces 50x50,
but unsure how reliable/important this is
-
Manually place eyes on set of like 20 images, don't use images in training set
Test each classifier on the perfect images, create some sort of distance function (distance from centers)
(intersection/union of bounding boxes -- IOU)
Game plan:
- Use both the Haar cascade and CNN tensorflow out-of-box solutions to find eyes. Test which one is more accurate (somehow)
- while it's accepted that CNNs are usually better nowadays, are their bounding boxes more accurate, or just the rate at which they recognize?
- In review: mention that generally known that CNNs are more accurate for object detection, but question the accuracy of the detection? Can a Haar Cascade detector locate eyes more accurately than a CNN?
- Take the better method and use it to place the glasses. Measure this against a small set of 20 images that are perfectly labelled.
Glasses:
Regular: Width 600px, height 205px. 76px from top of image to bottom of nose frame
Sunglasses: Width 600px, height 209px
Interesting assumption: head always taller than wider?
Bounding boxes on ellipse:
https://stackoverflow.com/questions/87734/how-do-you-calculate-the-axis-aligned-bounding-box-of-an-ellipse
REferences:
CNN face detection (using FaceNet): http://jekel.me/2017/How-to-detect-faces-using-facenet/
Haar cascade face + eyes detection: opencv tutorial,
comparision of CNN vs Haar: https://dzone.com/articles/cnn-vs-cascade-classifiers-for-object-detection
Face database: FDDB: Face Detection Data Set and Benchmark at http://vis-www.cs.umass.edu/fddb/
Uses Faces in the Wild for faces, then provides annotations
in fold 2, 285 images with total of 519 faces
first image in fold 2:
2002/07/28/big/img_416
[(53.8011831203448, 21.353383418584098, 88.95143375931039, 121.82969316283182), (285.3230637867992, 10.930149311060417, 71.56117442640164, 105.08978737787916), (346.7922990576719, 79.57017142820564, 36.134583884656195, 47.90292114358871)]
FaceNet: https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Schroff_FaceNet_A_Unified_2015_CVPR_paper.pdf
dlib: sliding window histogram-of-oriented-gradients based object detector.
http://dlib.net/face_detection_ex.cpp.html <---- explains that upsampling resizes image by 2x. Dlib library looks for faces that are about 80x80 or larger. So upsample once to detect 40x40 faces too
****************************************************
FACE DETECTION RESULTS:
The standard used for whether or not a face was correctly detected was that the center of the detected face
must be within 40% of the height and width of the labeled face to true center. Additionally, the width and height of
bounding box of detected face must have been > .5 and < 1.5 times the width and height of true bounding box. These
parameters were chosen after experimenting with many values. Making ranges too wide allowed for non-correctly detected
faces to be accidentally detected (false positives), while making ranges too small missed detections that were correct.
Face detection accuracies were reported with these ranges used.
For Haar cascade detector:
explaining params: http://www.bogotobogo.com/python/OpenCV_Python/python_opencv3_Image_Object_Detection_Face_Detection_Haar_Cascade_Classifiers.php
scale factor values: (time is hh:mm:ss.ms)
1.3 and 5, 1329/2067 faces, 25 false, 0.6429608127721336, 0:00:21.676171
1.2 and 5, 1427/2067 faces, 58, 0.6903725205611998, 0:00:30.377876
1.1 and 5, 1544/2067 faces, 155, 0.7469762941461054, 0:00:48.634285
1.05 and 5, 1627/2067 faces, 401, 0.7871311078858249, 0:01:29.261559
1.01 and 5, 1815/2067 faces, 1987, 0.8780841799709724, 0:06:51.084492
1.3 and 4, 1398/2067 faces, 43, 0.6763425253991292, 0:00:25.266321
1.2 and 4, 1479/2067 faces, 95, 0.7155297532656023, 0:00:30.429258
1.1 and 4, 1580/2067 faces, 213, 0.764392839864538, 0:01:07.422060
1.05 and 4, 1650/2067 faces, 527, 0.7982583454281568, 0:01:41.136731
1.01 and 4, 1848/2067 faces, 2287, 0.8940493468795355, 0:08:23.776399
1.3 and 3, 1447/2067 faces, 74, 0.7000483792936623, 0:00:22.047011
1.2 and 3, 1519/2067 faces, 172, 0.7348814707305273, 0:00:29.524132
1.1 and 3, 1610/2067 faces, 324, 0.7789066279632317, 0:00:49.368626
1.05 and 3, 1686/2067 faces, 718, 0.8156748911465893, 0:01:29.713371
1.01 and 3, 1886/2067 faces, 2736, 0.9124334784712144, 0:06:57.774940
1.3 and 2, 1503/2067 faces, 159, 0.7271407837445574, 0:00:22.130912
1.2 and 2, 1560/2067 faces, 291, 0.7547169811320755, 0:00:38.926712
1.1 and 2, 1657/2067 faces, 531, 0.8016448959845186, 0:00:51.998133
1.05 and 2, 1721/2067 faces, 1053, 0.8326076439283987, 0:01:29.194843
1.01 and 2, 1932/2067 faces, 3432, 0.9346879535558781, 0:08:04.406453
CNN detector: (steps were 0.6, 0.7, 0.7)
factor .800: 1629/2067 faces, 22, 0.7880986937590712, 0:16:34.575368
factor .750: 1649/2067 faces, 19, 0.7977745524915336, 0:16:39.013508
factor .709: 1608/2067 faces, 22, 0.7779390420899854, 0:16:10.188261
factor .650: 1585/2067 faces, 19, 0.7668118045476536, 0:16:07.263717
CNN detector: (steps now 0.5, 0.6, 0.7)
factor .800: 1704/2067, 38, 0.8243831640058055, 0:17:22.699501
factor .750: 1688/2067, 35, 0.8166424770198355, 0:16:42.087812
factor .709:
HOG detector:
upsample 0: 1478/2067, 5, 0.7150459603289792, 0:00:26.807566
upsample 1: 1603/2067, 9, 0.7755200774068699, 0:01:32.913658
upsample 2: 1615/2067, 13, 0.7813255926463474, 0:06:16.461375
avg height: 142.58539351061276
avg width: 94.11600875170973
over 5171 faces
http://vis-www.cs.umass.edu/fddb/fddb.pdf
On manual:
Haar 1.1, 5:
found 24 out of 24 faces in
accuracy: 1.0
found 24/24 faces
total false pos: 2
accuracy: 1.0
Time elapsed (hh:mm:ss.ms) 0:00:00.872882
Hog 1 upsample:
found 24 out of 24 faces in
accuracy: 1.0
found 24/24 faces
total false pos: 0
accuracy: 1.0
Time elapsed (hh:mm:ss.ms) 0:00:01.711387
CNN .800 scale, .6,.7,.7:
found 23 out of 24 faces in
accuracy: 0.9583333333333334
found 23/24 faces
total false pos: 0
accuracy: 0.9583333333333334
Time elapsed (hh:mm:ss.ms) 0:00:39.536741