diff --git a/03_ethics.ipynb b/03_ethics.ipynb index 285852a32..a87441fff 100644 --- a/03_ethics.ipynb +++ b/03_ethics.ipynb @@ -359,7 +359,7 @@ "\n", "> : One important signal to classify the main topic of a video is the channel it comes from. For example, a video uploaded to a cooking channel is very likely to be a cooking video. But how do we know what topic a channel is about? Well… in part by looking at the topics of the videos it contains! Do you see the loop? For example, many videos have a description which indicates what camera was used to shoot the video. As a result, some of these videos might get classified as videos about “photography.” If a channel has such a misclassified video, it might be classified as a “photography” channel, making it even more likely for future videos on this channel to be wrongly classified as “photography.” This could even lead to runaway virus-like classifications! One way to break this feedback loop is to classify videos with and without the channel signal. Then when classifying the channels, you can only use the classes obtained without the channel signal. This way, the feedback loop is broken.\n", "\n", - "There are positive examples of people and organizations attempting to combat these problems. Evan Estola, lead machine learning engineer at Meetup, [discussed the example](https://www.youtube.com/watch?v=MqoRzNhrTnQ) of men expressing more interest than women in tech meetups. taking gender into account could therefore cause Meetup’s algorithm to recommend fewer tech meetups to women, and as a result, fewer women would find out about and attend tech meetups, which could cause the algorithm to suggest even fewer tech meetups to women, and so on in a self-reinforcing feedback loop. So, Evan and his team made the ethical decision for their recommendation algorithm to not create such a feedback loop, by explicitly not using gender for that part of their model. It is encouraging to see a company not just unthinkingly optimize a metric, but consider its impact. According to Evan, \"You need to decide which feature not to use in your algorithm... the most optimal algorithm is perhaps not the best one to launch into production.\"\n", + "There are positive examples of people and organizations attempting to combat these problems. Evan Estola, lead machine learning engineer at Meetup, [discussed the example](https://www.youtube.com/watch?v=MqoRzNhrTnQ) of men expressing more interest than women in tech meetups. Taking gender into account could therefore cause Meetup’s algorithm to recommend fewer tech meetups to women, and as a result, fewer women would find out about and attend tech meetups, which could cause the algorithm to suggest even fewer tech meetups to women, and so on in a self-reinforcing feedback loop. So, Evan and his team made the ethical decision for their recommendation algorithm to not create such a feedback loop, by explicitly not using gender for that part of their model. It is encouraging to see a company not just unthinkingly optimize a metric, but consider its impact. According to Evan, \"You need to decide which feature not to use in your algorithm... the most optimal algorithm is perhaps not the best one to launch into production.\"\n", "\n", "While Meetup chose to avoid such an outcome, Facebook provides an example of allowing a runaway feedback loop to run wild. Like YouTube, it tends to radicalize users interested in one conspiracy theory by introducing them to more. As Renee DiResta, a researcher on proliferation of disinformation, [writes](https://www.fastcompany.com/3059742/social-network-algorithms-are-distorting-reality-by-boosting-conspiracy-theories):" ] diff --git a/05_pet_breeds.ipynb b/05_pet_breeds.ipynb index 2f39d3ad6..84101e824 100644 --- a/05_pet_breeds.ipynb +++ b/05_pet_breeds.ipynb @@ -1010,7 +1010,7 @@ " return torch.where(targets==1, 1-inputs, inputs).mean()\n", "```\n", "\n", - "Just as we moved from sigmoid to softmax, we need to extend the loss function to work with more than just binary classification—it needs to be able to classify any number of categories (in this case, we have 37 categories). Our activations, after softmax, are between 0 and 1, and sum to 1 for each row in the batch of predictions. Our targets are integers between 0 and 36. Furthermore, cross-entropy loss generalizes our binary classification loss and allows for more than one correct label per example (which is called multi-label classificaiton, which we will discuss in Chapter 6).\n", + "Just as we moved from sigmoid to softmax, we need to extend the loss function to work with more than just binary classification—it needs to be able to classify any number of categories (in this case, we have 37 categories). Our activations, after softmax, are between 0 and 1, and sum to 1 for each row in the batch of predictions. Our targets are integers between 0 and 36. Furthermore, cross-entropy loss generalizes our binary classification loss and allows for more than one correct label per example (which is called multi-label classification, which we will discuss in Chapter 6).\n", "\n", "In the binary case, we used `torch.where` to select between `inputs` and `1-inputs`. When we treat a binary classification as a general classification problem with two categories, it actually becomes even easier, because (as we saw in the previous section) we now have two columns, containing the equivalent of `inputs` and `1-inputs`. Since there is only one correct label per example, all we need to do is select the appropriate column (as opposed to multiplying multiple probabilities). Let's try to implement this in PyTorch. For our synthetic 3s and 7s example, let's say these are our labels:" ] diff --git a/07_sizing_and_tta.ipynb b/07_sizing_and_tta.ipynb index f9f346fa5..41e60a781 100644 --- a/07_sizing_and_tta.ipynb +++ b/07_sizing_and_tta.ipynb @@ -914,9 +914,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's practice our paper-reading skills to try to interpret this. \"This maximum\" is refering to the previous part of the paragraph, which talked about the fact that 1 is the value of the label for the positive class. So it's not possible for any value (except infinity) to result in 1 after sigmoid or softmax. In a paper, you won't normally see \"any value\" written; instead it will get a symbol, which in this case is $z_k$. This shorthand is helpful in a paper, because it can be referred to again later and the reader will know what value is being discussed.\n", + "Let's practice our paper-reading skills to try to interpret this. \"This maximum\" is referring to the previous part of the paragraph, which talked about the fact that 1 is the value of the label for the positive class. So it's not possible for any value (except infinity) to result in 1 after sigmoid or softmax. In a paper, you won't normally see \"any value\" written; instead it will get a symbol, which in this case is $z_k$. This shorthand is helpful in a paper, because it can be referred to again later and the reader will know what value is being discussed.\n", "\n", - "Then it says \"if $z_y\\gg z_k$ for all $k\\neq y$.\" In this case, the paper immediately follows the math with an English description, which is handy because you can just read that. In the math, the $y$ is refering to the target ($y$ is defined earlier in the paper; sometimes it's hard to find where symbols are defined, but nearly all papers will define all their symbols somewhere), and $z_y$ is the activation corresponding to the target. So to get close to 1, this activation needs to be much higher than all the others for that prediction.\n", + "Then it says \"if $z_y\\gg z_k$ for all $k\\neq y$.\" In this case, the paper immediately follows the math with an English description, which is handy because you can just read that. In the math, the $y$ is referring to the target ($y$ is defined earlier in the paper; sometimes it's hard to find where symbols are defined, but nearly all papers will define all their symbols somewhere), and $z_y$ is the activation corresponding to the target. So to get close to 1, this activation needs to be much higher than all the others for that prediction.\n", "\n", "Next, consider the statement \"if the model learns to assign full probability to the ground-truth label for each training example, it is not guaranteed to generalize.\" This is saying that making $z_y$ really big means we'll need large weights and large activations throughout our model. Large weights lead to \"bumpy\" functions, where a small change in input results in a big change to predictions. This is really bad for generalization, because it means just one pixel changing a bit could change our prediction entirely!\n", "\n", diff --git a/08_collab.ipynb b/08_collab.ipynb index 334ec54e4..39853bb9c 100644 --- a/08_collab.ipynb +++ b/08_collab.ipynb @@ -1935,7 +1935,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now that we have succesfully trained a model, let's see how to deal with the situation where we have no data for a user. How can we make recommendations to new users?" + "Now that we have successfully trained a model, let's see how to deal with the situation where we have no data for a user. How can we make recommendations to new users?" ] }, { diff --git a/09_tabular.ipynb b/09_tabular.ipynb index b767040df..8b536dc2b 100644 --- a/09_tabular.ipynb +++ b/09_tabular.ipynb @@ -8330,7 +8330,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "One thing that makes this harder to interpret is that there seem to be some variables with very similar meanings: for example, `ProductGroup` and `ProductGroupDesc`. Let's try to remove any redundent features. " + "One thing that makes this harder to interpret is that there seem to be some variables with very similar meanings: for example, `ProductGroup` and `ProductGroupDesc`. Let's try to remove any redundant features. " ] }, { @@ -8811,7 +8811,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "`prediction` is simply the prediction that the random forest makes. `bias` is the prediction based on taking the mean of the dependent variable (i.e., the *model* that is the root of every tree). `contributions` is the most interesting bit—it tells us the total change in predicition due to each of the independent variables. Therefore, the sum of `contributions` plus `bias` must equal the `prediction`, for each row. Let's look just at the first row:" + "`prediction` is simply the prediction that the random forest makes. `bias` is the prediction based on taking the mean of the dependent variable (i.e., the *model* that is the root of every tree). `contributions` is the most interesting bit—it tells us the total change in prediction due to each of the independent variables. Therefore, the sum of `contributions` plus `bias` must equal the `prediction`, for each row. Let's look just at the first row:" ] }, { @@ -9911,7 +9911,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We have dicussed two approaches to tabular modeling: decision tree ensembles and neural networks. We've also mentioned two different decision tree ensembles: random forests, and gradient boosting machines. Each is very effective, but each also has compromises:\n", + "We have discussed two approaches to tabular modeling: decision tree ensembles and neural networks. We've also mentioned two different decision tree ensembles: random forests, and gradient boosting machines. Each is very effective, but each also has compromises:\n", "\n", "- *Random forests* are the easiest to train, because they are extremely resilient to hyperparameter choices and require very little preprocessing. They are very fast to train, and should not overfit if you have enough trees. But they can be a little less accurate, especially if extrapolation is required, such as predicting future time periods.\n", "\n", diff --git a/11_midlevel_data.ipynb b/11_midlevel_data.ipynb index cce5f69ad..4bcb29e36 100644 --- a/11_midlevel_data.ipynb +++ b/11_midlevel_data.ipynb @@ -1087,7 +1087,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "For each image our tranform will, with a probability of 0.5, draw an image from the same class and return a `SiameseImage` with a true label, or draw an image from another class and return a `SiameseImage` with a false label. This is all done in the private `_draw` function. There is one difference between the training and validation sets, which is why the transform needs to be initialized with the splits: on the training set we will make that random pick each time we read an image, whereas on the validation set we make this random pick once and for all at initialization. This way, we get more varied samples during training, but always the same validation set:" + "For each image our transform will, with a probability of 0.5, draw an image from the same class and return a `SiameseImage` with a true label, or draw an image from another class and return a `SiameseImage` with a false label. This is all done in the private `_draw` function. There is one difference between the training and validation sets, which is why the transform needs to be initialized with the splits: on the training set we will make that random pick each time we read an image, whereas on the validation set we make this random pick once and for all at initialization. This way, we get more varied samples during training, but always the same validation set:" ] }, { diff --git a/17_foundations.ipynb b/17_foundations.ipynb index 016ec4e0a..334ce432f 100644 --- a/17_foundations.ipynb +++ b/17_foundations.ipynb @@ -550,7 +550,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "What if have different means for each row of the matrix? in that case you will need to broadcast a vector to a matrix." + "What if have different means for each row of the matrix? In that case you will need to broadcast a vector to a matrix." ] }, {