If you find any mistakes in The Kaggle Book, or if you have suggestions for improvements, then please raise an issue in this repository, or email to us.
Here is how we define the cells:
- TP (true positives): These are located in the
lower-right cell
, containing examples that have been correctly predicted as positive ones. - FP (false positives): These are located in the
upper-right cell
, containing examples that have been predicted as positive but are actually negative. - FN (false negatives): These are located in the
lower-left cell
, containing examples that have been predicted as negative but are actually positive. - TN (true negatives): These are located in the
upper-left cell
, containing examples that have been correctly predicted as negative ones.
Following is the correct formula for R squared
"co-efficient of determination":
The first paragraph says: "A bad classifier can be spotted by the ROC curve appearing very similar, if not identical, to the diagonal of the chart, which represents the performance of a purely random classifier, as in the top right of Figure 5.3; ROC-AUC scores near 0.5 are considered to be almost random results."
The last line of the note should say: ".....YOLO (https://arxiv.org/abs/1506.02640v1), Faster R-CNN https://arxiv.org/abs/1506.01497v1), or SSD (https://arxiv.org/abs/1512.02325)."
Change "only 1,495" to "about 24,500"
Instead of feature_19 and feature_54, the correct features that appear the most different between the training/test split are cont14, cont4, and cont5.