deploy: 9cc34a8

UBC-CS · Dec 3, 2024 · 621d72b · 621d72b
1 parent aef83b0
commit 621d72b
Show file tree

Hide file tree

Showing 5 changed files with 257 additions and 10 deletions.
diff --git a/_sources/lectures/notes/final-exam-review-guiding-question.ipynb b/_sources/lectures/notes/final-exam-review-guiding-question.ipynb
@@ -99,7 +99,8 @@
     "- What are the advantages of cross-validation?\n",
     "- Why it's important to look at sub-scores of cross-validation?\n",
     "- What is the fundamental trade-off in supervised machine learning?\n",
-    "- What is the Golden rule in supervised machine learning? "
+    "- What is the Golden rule in supervised machine learning?\n",
+    "- Scenarios for data leakage "
    ]
   },
   {
@@ -113,8 +114,28 @@
     "- KNNs, SVM RBFs\n",
     "- Linear models \n",
     "- Random forests\n",
-    "- Grading Boosyinh, LGBM, CatBoost\n",
-    "- Stacking, averaging "
+    "- Grading Boosting, LGBM, CatBoost\n",
+    "- Stacking, averaging\n",
+    "\n",
+    "**Comparison of models**\n",
+    "| **Model**        | Parameters and hyperparameters | **Strengths**  | **Weaknesses**     |\n",
+    "|------------------|--------------------------------|---------------------------|---------------------------|\n",
+    "| **Decision Trees**               |  |  |  |\n",
+    "| **KNNs**              |  |  |  |\n",
+    "| **SVM RBF**            |  |  |  |\n",
+    "| **Linear models**         |  |  | | \n",
+    "| **Random forests**         |  |  | | \n",
+    "| **Gradient boosting**         |  |  | | \n",
+    "| **Stacking**         |  |  | | \n",
+    "| **Averaging**         |  |  | | \n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b43fa4c-5691-4397-a057-a881d1d94179",
+   "metadata": {},
+   "source": [
+    "<br><br>"
    ]
   },
   {
@@ -133,6 +154,22 @@
     "- What are various data preprocessing steps such as scaling, OHE, ordinal encoding, and handling missing values. Why and when each step is necessary?"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "46551fbd-cf55-418c-867d-f8c7705fe7d1",
+   "metadata": {},
+   "source": [
+    "**`sklearn` Transformers** \n",
+    "| **Transformer**        | Hyperparameters | **When to use?** |\n",
+    "|------------------|--------------------------------|---------------------------|\n",
+    "| `SimpleImputer`  |  |  | \n",
+    "| `StandardScaler`              |  |  | \n",
+    "| `OneHotEncoder`            |  |  | \n",
+    "| `OrdinalEncoder`         |  |  | \n",
+    "| `CountVectorizer`        |  |  | \n",
+    "| `TransformedTargetRegressor` | | |\n"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "bf30b454-9f43-481e-9b1c-da43031fc0d8",
@@ -586,7 +623,14 @@
     "\n",
     "- What makes hyperparameter optimization a hard problem?\n",
     "- What are two different tools provided by sklearn for hyperparameter optimization?  \n",
-    "- What is optimization bias? "
+    "- What is optimization bias?\n",
+    "\n",
+    "\n",
+    "| **Method**        | Strengths/Weaknesses | **When to use?** |\n",
+    "|------------------|--------------------------------|---------------------------|\n",
+    "| Nested for loops |  |  | \n",
+    "| Grid search  |  |  | \n",
+    "| Random search  |  |  | "
    ]
   },
   {
@@ -604,6 +648,31 @@
     "- What are advantages of RMSE or MAPE over MSE? "
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "7e11a3f7-0ec3-4306-a84e-43fe74869e20",
+   "metadata": {},
+   "source": [
+    "**Classification Metrics**\n",
+    "| **Metric**        | How to generate/calculate? | **When to use?** |\n",
+    "|------------------|--------------------------------|---------------------------|\n",
+    "| Accuracy  |  |  | \n",
+    "| Precision              |  |  | \n",
+    "| Recall          |  |  | \n",
+    "| F1-score         |  |  | \n",
+    "| AP score        |  |  | \n",
+    "| AUC        |  |  | \n",
+    "\n",
+    "\n",
+    "**Regression Metrics**\n",
+    "| **Metric**        | How to generate/calculate? | **When to use?** |\n",
+    "|------------------|--------------------------------|---------------------------|\n",
+    "| MSE  |  |  | \n",
+    "| RMSE              |  |  | \n",
+    "| r2 score          |  |  | \n",
+    "| MAPE         |  |  | "
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "a1e6c11b-ee26-4d37-87ea-2b6bd3560f60",

diff --git a/lectures/101-Giulia-lectures/07_linear-models.html b/lectures/101-Giulia-lectures/07_linear-models.html
@@ -1751,8 +1751,8 @@ <h4>Predicting with learned weights<a class="headerlink" href="#predicting-with-
 <p>In our case, for values for the coefficient of <em>boring</em> &lt; -3.36, the prediction would be negative.</p>
 <p>A linear model learns these coefficients or weights from the training data!</p>
 <p>So a linear classifier is a linear function of the input <code class="docutils literal notranslate"><span class="pre">X</span></code>, followed by a threshold.</p>
-<div class="amsmath math notranslate nohighlight" id="equation-86e21716-4141-4267-b699-8353676a67fc">
-<span class="eqno">(2)<a class="headerlink" href="#equation-86e21716-4141-4267-b699-8353676a67fc" title="Permalink to this equation">#</a></span>\[\begin{equation}
+<div class="amsmath math notranslate nohighlight" id="equation-21d4f5cc-1dfa-4e83-8a4a-f5e13992a2f9">
+<span class="eqno">(2)<a class="headerlink" href="#equation-21d4f5cc-1dfa-4e83-8a4a-f5e13992a2f9" title="Permalink to this equation">#</a></span>\[\begin{equation}
 \begin{split}
 z =&amp; w_1x_1 + \dots + w_dx_d + b\\
 =&amp; w^Tx + b

diff --git a/lectures/notes/07_linear-models.html b/lectures/notes/07_linear-models.html
@@ -1695,8 +1695,8 @@ <h4>Predicting with learned weights<a class="headerlink" href="#predicting-with-
 <p>In our case, for values for the coefficient of <em>boring</em> &lt; -3.36, the prediction would be negative.</p>
 <p>A linear model learns these coefficients or weights from the training data!</p>
 <p>So a linear classifier is a linear function of the input <code class="docutils literal notranslate"><span class="pre">X</span></code>, followed by a threshold.</p>
-<div class="amsmath math notranslate nohighlight" id="equation-90015cd2-dc20-466e-9266-611936d67486">
-<span class="eqno">(1)<a class="headerlink" href="#equation-90015cd2-dc20-466e-9266-611936d67486" title="Permalink to this equation">#</a></span>\[\begin{equation}
+<div class="amsmath math notranslate nohighlight" id="equation-f13e2e3a-42f5-4848-8fe9-cf2e92698731">
+<span class="eqno">(1)<a class="headerlink" href="#equation-f13e2e3a-42f5-4848-8fe9-cf2e92698731" title="Permalink to this equation">#</a></span>\[\begin{equation}
 \begin{split}
 z =&amp; w_1x_1 + \dots + w_dx_d + b\\
 =&amp; w^Tx + b

diff --git a/lectures/notes/final-exam-review-guiding-question.html b/lectures/notes/final-exam-review-guiding-question.html
@@ -521,6 +521,7 @@ <h3>ML fundamentals<a class="headerlink" href="#ml-fundamentals" title="Link to
 <li><p>Why it’s important to look at sub-scores of cross-validation?</p></li>
 <li><p>What is the fundamental trade-off in supervised machine learning?</p></li>
 <li><p>What is the Golden rule in supervised machine learning?</p></li>
+<li><p>Scenarios for data leakage</p></li>
 </ul>
 </section>
 <section id="pros-cons-parameters-and-hyperparameters-of-different-ml-models">
@@ -530,15 +531,105 @@ <h3>Pros, cons, parameters and hyperparameters of different ML models<a class="h
 <li><p>KNNs, SVM RBFs</p></li>
 <li><p>Linear models</p></li>
 <li><p>Random forests</p></li>
-<li><p>Grading Boosyinh, LGBM, CatBoost</p></li>
+<li><p>Grading Boosting, LGBM, CatBoost</p></li>
 <li><p>Stacking, averaging</p></li>
 </ul>
+<p><strong>Comparison of models</strong></p>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p><strong>Model</strong></p></th>
+<th class="head"><p>Parameters and hyperparameters</p></th>
+<th class="head"><p><strong>Strengths</strong></p></th>
+<th class="head"><p><strong>Weaknesses</strong></p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><strong>Decision Trees</strong></p></td>
+<td><p></p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+<tr class="row-odd"><td><p><strong>KNNs</strong></p></td>
+<td><p></p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+<tr class="row-even"><td><p><strong>SVM RBF</strong></p></td>
+<td><p></p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+<tr class="row-odd"><td><p><strong>Linear models</strong></p></td>
+<td><p></p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+<tr class="row-even"><td><p><strong>Random forests</strong></p></td>
+<td><p></p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+<tr class="row-odd"><td><p><strong>Gradient boosting</strong></p></td>
+<td><p></p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+<tr class="row-even"><td><p><strong>Stacking</strong></p></td>
+<td><p></p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+<tr class="row-odd"><td><p><strong>Averaging</strong></p></td>
+<td><p></p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+</tbody>
+</table>
+</div>
+<p><br><br></p>
 </section>
 <section id="preprocessing">
 <h3>Preprocessing<a class="headerlink" href="#preprocessing" title="Link to this heading">#</a></h3>
 <ul class="simple">
 <li><p>What are various data preprocessing steps such as scaling, OHE, ordinal encoding, and handling missing values. Why and when each step is necessary?</p></li>
 </ul>
+<p><strong><code class="docutils literal notranslate"><span class="pre">sklearn</span></code> Transformers</strong></p>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p><strong>Transformer</strong></p></th>
+<th class="head"><p>Hyperparameters</p></th>
+<th class="head"><p><strong>When to use?</strong></p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">SimpleImputer</span></code></p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">StandardScaler</span></code></p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">OneHotEncoder</span></code></p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">OrdinalEncoder</span></code></p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">CountVectorizer</span></code></p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">TransformedTargetRegressor</span></code></p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+</tbody>
+</table>
+</div>
 <p>Let’s bring back our quiz2 grades toy dataset.</p>
 <div class="cell docutils container">
 <div class="cell_input docutils container">
@@ -875,6 +966,29 @@ <h3>Hyperparameter optimization<a class="headerlink" href="#hyperparameter-optim
 <li><p>What are two different tools provided by sklearn for hyperparameter optimization?</p></li>
 <li><p>What is optimization bias?</p></li>
 </ul>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p><strong>Method</strong></p></th>
+<th class="head"><p>Strengths/Weaknesses</p></th>
+<th class="head"><p><strong>When to use?</strong></p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p>Nested for loops</p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+<tr class="row-odd"><td><p>Grid search</p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+<tr class="row-even"><td><p>Random search</p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+</tbody>
+</table>
+</div>
 </section>
 <section id="evaluation-metrics">
 <h3>Evaluation metrics<a class="headerlink" href="#evaluation-metrics" title="Link to this heading">#</a></h3>
@@ -886,6 +1000,70 @@ <h3>Evaluation metrics<a class="headerlink" href="#evaluation-metrics" title="Li
 <li><p>What’s the main difference between AP score and F1 score?</p></li>
 <li><p>What are advantages of RMSE or MAPE over MSE?</p></li>
 </ul>
+<p><strong>Classification Metrics</strong></p>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p><strong>Metric</strong></p></th>
+<th class="head"><p>How to generate/calculate?</p></th>
+<th class="head"><p><strong>When to use?</strong></p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p>Accuracy</p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+<tr class="row-odd"><td><p>Precision</p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+<tr class="row-even"><td><p>Recall</p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+<tr class="row-odd"><td><p>F1-score</p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+<tr class="row-even"><td><p>AP score</p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+<tr class="row-odd"><td><p>AUC</p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+</tbody>
+</table>
+</div>
+<p><strong>Regression Metrics</strong></p>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p><strong>Metric</strong></p></th>
+<th class="head"><p>How to generate/calculate?</p></th>
+<th class="head"><p><strong>When to use?</strong></p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p>MSE</p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+<tr class="row-odd"><td><p>RMSE</p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+<tr class="row-even"><td><p>r2 score</p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+<tr class="row-odd"><td><p>MAPE</p></td>
+<td><p></p></td>
+<td><p></p></td>
+</tr>
+</tbody>
+</table>
+</div>
 </section>
 <section id="ensembles">
 <h3>Ensembles<a class="headerlink" href="#ensembles" title="Link to this heading">#</a></h3>

diff --git a/searchindex.js b/searchindex.js