---
**Vehicle Detection Project**
---
The goals / steps of this project are the following:
- Perform a Histogram of Oriented Gradients (HOG) feature extraction on a labeled training set of images and train a classifier Linear SVM classifier
- Apply a color transform and append binned color features, as well as histograms of color, to HOG feature vector.
- Normalize features and randomize a selection for training and testing.
- Implement a sliding-window technique and use your trained classifier to search for vehicles in images.
- Run pipeline on a video stream (start with the test_video.mp4 and later implement on full project_video.mp4) and create a heat map of recurring detections frame by frame to reject outliers and follow detected vehicles.
- Estimate a bounding box for vehicles detected.
The main files are :
- Project_Report.md -- The README for the project
- Classifier.ipynb -- Notebook containing code for feature extraction and classifier training.
- Detect.ipynb -- Notebook containing code for car detection in images and videos.
- FeatureVis.ipynb -- Utility notebook to visualize hog, spatial and color histogram features for the report.
- project_video_out.mp4 -- Output Project video
The rest of the code is in python files below:
- feature_ex.py -- Functions to extract HOG and color and spatial features.
- default_params.py -- Parameters for the project e.g., color space, hog feature parameters etc.
- heat.py -- Functions for adding heat maps.
- utils.py -- Utilities e.g., color conversion.
- moredata.py -- Functions to udacity augmented data (unused).
The model and scaler are saved as follows:
- vehicle_model.pkl -- The SVM model saved as a pickle file
- vehicle_scaler.pkl -- The Standard scaler saved as a pickle file
I thank the previous reviewer and forum mentors for providing valuable feedback, particularly, involving the need to convert to BGR space for video pipeline. While, I appreciate the previous reviewer's feedback, the color conversion was the main reason for unstable windows in my previous implementation.
A lot of the code has been used from the class lectures. The idea of smoothing out the heatmaps was discussed in a forum and I based my implementation on that.
The images in the vehicle
and non-vehicle
dataset were used for training a classifier.
An example of a vehicle image is below.
An example of a non-vehicle image is below.
Throughout the code I used OpenCV API cv2.imread to read the test images. This API reads the images in BGR colorspace. I used YCrCb color spaces, I saw that YCrCb seemed to work better on some test images and decided to use it. Further, the vehicles can appear in any color, so color spaces such as HSV may not work well. An example image in YCrCb is shown below:
For feature selection, I used the following parameters:
-
Spatial binning of color with size
(32, 32)
. An example histogram of spatial features for an image from training set is shown below: -
Color histogram features, with
32 bins
andrange (0, 256)
.An example histogram of these features is shown below:
-
HOG features, with parameters
orientations = 9
,pixels_per_cell = 8
andcells_per_block = 2
. The hog features were computed usingskimage.hog()
.The Hog features were used on all color channels. The following is a representation for each channel.
The code for feature extraction is in the file feature_ex.py
.
There were about 8792 vehicle images, and 8968 non-vehicle images. After feature extraction, each image resulted in a feature vector with 8460 features.
The featuers were normalized using StandardScaler
.
The features were split randomly in the ratio 80% / 20 % into training and validation sets.
This data set was then used to train a linear SVM classifier using LinearSVC
from sklearn.svm
.
The models and the scaler were the saved into 'vehicle_model.pkl', and 'vehicle_scaler.pkl' respectively.
With the features selected as above I could train the classifier with an accuracy of 99.1%
on
the validation set. This was high enough accuracy and I did not explore other parameter settings.
The code for classifier training is part of Classifier.ipynb.
In order to detect vehicles in the images, I used a sliding window approach. In this approach windows of various sizes were used to compute subsets of the image and resized to (64,64) size. Feature extraction as explained above was then applied the trained classifier was used to predict whether a vehicle was present in a car.
The major part of the project involved properly implementing the sliding window search for cars in images. The goal was to minimize the false positives and at the same time make sure that all the cars were detected in all images and frames of the videos.
For example, in my initial implementation, I used four groups of sliding windows namely, with y-coordinate ranges and overlaps shown below.
Window Scale | (y_start, y_stop) | Overlap |
---|---|---|
32x32 | (400, 528) | None |
32x32 | (500, 628) | 25% |
96x96 | (400, 656) | 50% |
128x128 | (400, 656) | 50% |
An image with these windows draw is illustrated below:
Using the raw windows with positive classification, this resulted in many false positives, as shown below.
I then created a heatmap and applied a threshold to ensure at least two groups were overalapping and this eliminated the false positives. However, there were images, where the white car was not detected; see figure below.
After analyzing I found that it was not a problem with the classifier per se, but rather a problem of incorrect offset of sliding windows. After a few experiments, I finally arrived at the following offset and sizes for the sliding windows.
Window Scale | (y_start, y_stop) | Overlap |
---|---|---|
96x96 | (400, 656) | 50% |
96x64 | (404, 468) | 50% |
128x128 | (400, 656) | 50% |
128x64 | (404, 468) | 50% |
The detailed code for creating these sliding windows is in the
function create_sliding_windows
in the notebook Detect.ipynb
.
An image with these windows is illustrated below:
Using these windows there were no missing cars in the images, as confirmed by the example below.
Also, the corresponding hot windows are shown below.
The thresholded heatmap is shown below:
The illustrations for test images are all in the notebook Detect.ipynb.
This image pipeline is implemented as a function, namely, process_image
in Detect.ipynb.
The video pipeline implementation involved invoking the image pipeline functionality with each frame of the video. However, there were two modifications:
- Before feeding the frame to image pipeline, it was converted to BGR space.
- A moving average heatmap was maintained to have smooth vehicle detection.
The second part, namely, moving average heatmap was implemented as class VehicleTracker
in Detect.ipynb. As suggested by discussions in the forum, I used a deque of size 10
to maintain the heatmap of the last 10 frames and then thresholded the sum. The
details are in the member function process_frame
of the VechicleTracker
class.
Here's a link to my video result
I was successfully able to eliminate the false positives and detect cars in all frames of the videos using the methods described above. However, a number of improvments are possible, namely:
- Make classifier robust to different search with different window sizes and offsets. One potential way to achieve this is to augment the training data with various cropped portions of the images, with cropping corresponding to window choices and overlaps.
- Use brightness augmentation of the data. I noticed the training images have different lighting to the video.
- Another aspect apart from accuracy to improve would be runtime. One obvious direction is to use HOG subsampling method.