Skip to content

Commit

Permalink
More section in linear_regression and added the exact solution
Browse files Browse the repository at this point in the history
  • Loading branch information
Fady Bishara committed Jul 31, 2024
1 parent 0288bad commit 8ecf8dc
Show file tree
Hide file tree
Showing 2 changed files with 85 additions and 3 deletions.
66 changes: 64 additions & 2 deletions docs/linear_regression.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,8 +122,8 @@ The parameter $\eta$ should be small (i.e., $\eta <1$) and is called the ==learn

The following pseudocode is adapted from the SGD algorithm (8.1) from the [Deep Learning](https://www.deeplearningbook.org/) book by Goodfellow, Bengio, and Courville [see chapter 8, page 291]
```ruby
Require: learning rate, eta
Require: initial parameters, w and b
input: learning rate, eta
input: initial parameters, w and b
k = 1
while (do another epoch == True) do
loop over minibatches
Expand All @@ -134,3 +134,65 @@ The parameter $\eta$ should be small (i.e., $\eta <1$) and is called the ==learn
k = k + 1
end while
```

## Putting it all together

Okay so now we have the different pieces that we need to actually _do_ the regression. But first, we need some data. To have full control of the features (i.e., the _true_ underlying model) of this dataset, let's create some synthetic (artificial) data with `scikit-learn`.


### Making a synthetic dataset

```numpy
from sklearn.datasets import make_regression
features, targets, coef = make_regression(
n_samples=1000,
n_features=1,
n_targets=1,
bias=15,
noise=10,
coef=True
)
# NOTE: the X array returned by scipy.datasets.make_regression is not a 1d array
# even if n_features=1
features = np.squeeze(features)
```

## Fit the parameters of the model and investigate you results

Using SGD as described [above](#the-gradient-descent-algorithm), fit the parameters of the model and compare them with the parameters used to generate the dataset.

Here are a few things you should carefully consider (and **experiment** with!):

<div class="annotate" markdown>
- The initial guesses for $w$ and $b$.
- The learning rate: a good starting value is something like 0.01 or 0.001 but play around and see what different values do.
- The number of training epochs. (1) Did you ask for too many or too few? How can you tell (see list below for hints).
</div>
1. Epochs count how many times you use the entire dataset during training, i.e., how many times the outer loop of the SGD algorithm described above is executed.


And here are a few things you should _definitely_ do:

- Plot the loss (at least `#!matplotlib plt.semilogy` or `#!matplotlib plt.loglog`) as a function of training steps (batches).
* Is it smooth or does have a lot of noise?
* Are there features like a "knee" where the behavior qualitatively changes?
* Did the loss reach an asymptotic value? If so, did the training continue for long after that?
- Plot the model parameters as function of training steps
- Plot the data (`#!python plt.scatter`) and the fitted model


!!! info "Exact solution"

In this (simple) case, and with a bit of algebra, we can find a closed form solution for the parameters $w$ and $b$. The loss function here is strictly convex and thus has a unique minimum. The solution can be found by solving the linear set of equations $\nabla\mathcal{L}=\vec{0}$ and is given by,

\begin{equation*}
w = \frac{\mathrm{cov}(X, T)}{
\mathrm{var}\,X
}\,,\qquad
b = \overline{T} - w\,\overline{X}\,.
\end{equation*}

The captial letters mean the full vector of features, $x$, and targets, $t$, in the dataset. The operators $\mathrm{cov}$ and $\mathrm{var}$ are the covariance and variance respectively. They can be computed with the `numpy` functions `cov` and `var`. Finally, an overline as in $\overline{X}$ means the average (`numpy.mean`) over the features in the datasets and similarly for the targets.

22 changes: 21 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,32 @@
site_name: Machine Learning Tutorial

repo_url: https://github.com/European-XFEL/machine_learning_tutorial.git

theme:
name: material
palette:
scheme: preference

# Palette toggle for light mode
- scheme: default
toggle:
icon: material/brightness-7


name: Switch to dark mode

# Palette toggle for dark mode
- scheme: slate
toggle:
icon: material/brightness-4
name: Switch to light mode
# scheme: preference
font:
text: Roboto
code: Roboto Mono

icon:
repo: fontawesome/brands/github

features:
- content.code.select
- content.code.copy
Expand Down

0 comments on commit 8ecf8dc

Please sign in to comment.