Skip to content

Masters-level applied econometrics course—focusing on prediction—at the University of Oregon (EC424/524 during Winter quarter, 2025). Taught by Ed Rubin and Andrew Dickinson.

License

Notifications You must be signed in to change notification settings

edrubin/EC524W25

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EC 524/424, Winter 2025

Welcome to Economics 524 (424): Prediction and machine-learning in econometrics, taught by Ed Rubin and Andrew Dickinson.

Schedule

Lecture Tuesdays and Thursdays, 10:00a-11:20a (Pacific), 101 McKenzie

Lab Friday, 2:00p–2:50p (Pacific), 195 Anstett

Office hours

  • Ed Rubin Tu. 2:30p–3:30p (PLC 530)
  • Andrew Dickinson We. 3p–4p (Zoom)

Syllabus

Syllabus

Books

Required books

Suggested books

Lecture notes

Note: Links to topics that we have not yet covered lead to older slides. I will update links to the new slides as we work our way through the term/slides.

000 - Overview (Why predict?)

  1. Why do we have a class on prediction?
  2. How is prediction (and how are its tools) different from causal inference?
  3. Motivating examples

Formats .html | .pdf | .rmd

Readings Introduction in ISL

001 - Statistical learning foundations

  1. Why do we have a class on prediction?
  2. How is prediction (and how are its tools) different from causal inference?
  3. Motivating examples

Formats .html | .pdf | .rmd

Readings

Supplements Unsupervised character recognization

002 - Model accuracy

  1. Model accuracy
  2. Loss for regression and classification
  3. The variance-bias tradeoff
  4. The Bayes classifier
  5. KNN

Formats .html | .pdf | .rmd

Readings

  • ISL Ch2–Ch3
  • Optional: 100ML Preface and Ch1–Ch4

003 - Resampling methods

  1. Review
  2. The validation-set approach
  3. Leave-out-out cross validation
  4. k-fold cross validation
  5. The bootstrap

Formats .html | .pdf | .rmd

Readings

  • ISL Ch5
  • Optional: 100ML Ch5

004 - Linear regression strikes back

  1. Returning to linear regression
  2. Model performance and overfit
  3. Model selection—best subset and stepwise
  4. Selection criteria

Formats .html | .pdf | .Rmd

Readings

  • ISL Ch3
  • ISL Ch6.1

In between: tidymodels-ing

005 - Shrinkage methods

(AKA: Penalized or regularized regression)

  1. Ridge regression
  2. Lasso
  3. Elasticnet

Formats .html | .pdf | .Rmd

Readings

  • ISL Ch4
  • ISL Ch6

006 - Classification intro

  1. Introduction to classification
  2. Why not regression?
  3. But also: Logistic regression
  4. Assessment: Confusion matrix, assessment criteria, ROC, and AUC

Formats .html | .pdf | .Rmd

Readings

  • ISL Ch4

007 - Decision trees

  1. Introduction to trees
  2. Regression trees
  3. Classification trees—including the Gini index, entropy, and error rate

Formats .html | .pdf | .rmd

Readings

  • ISL Ch8.1–Ch8.2

008 - Ensemble methods

  1. Introduction
  2. Bagging
  3. Random forests
  4. Boosting

Formats .html | .pdf | .rmd

Readings

  • ISL Ch8.2

009 - Support vector machines

  1. Hyperplanes and classification
  2. The maximal margin hyperplane/classifier
  3. The support vector classifier
  4. Support vector machines

Formats .html | .pdf | .rmd

Readings

  • ISL Ch9

010 - Dimensionality reduction and unsupervised learning

  1. MNIST dataset (machines with vision)
  2. K-means clustering
  3. Principal component analysis (PCA)
  4. UMAP

Formats .html | .pdf | .Rmd

Projects

Past, present, and future projects.

000 Predicting sales price in housing data (Kaggle)

Due: Friday 31 January 2025 by midnight (before 11:59 PM) Pacific

Help:

001 Validation and out-of-sample performance

Due: Thursday 13 February 2025 by midnight (before 11:59 PM) Pacific

002 Penalized regression, logistic regression, and classification

003 Trees, ensembles, and imputation

Help from class

Class project

Outline of the project

Topic due by midnight on 09 February 2025.

Final project submission due by 11:59p on 12 March 2025.

Final exam

In-class exam: Monday (17 March 2025) at 10:15a–12:15p
Note: Previous years had a take-home portion of the final exam. This year, we will only have an in-class exam.

Prep materials
Previous take-home exam: 2023 | 2024
Previous in-class exams: 2023 | 2024
Note: I am not providing keys.

Lab notes

Approximate/planned topics...

000 - Workflow and cleaning

  1. General "best practices" for coding
  2. Working with RStudio
  3. The pipe (%>%)
  4. Cleaning and Kaggle follow up

Formats .html | .pdf | .Rmd

001 - Workflow and cleaning: An example

Follow these steps to get started on the lab this week.

  1. Install Quarto. Follow this link, download the installer for your operating system, and follow the instructions to install Quarto
  2. Download (and unzip) the Housing data and the Quarto document (download button top right corner of page)
  3. Create a project in RStudio in a separate folder
  4. Copy/move the data files and the Quarto document to a folder dedicated to this lab
  5. Open the Quarto document in RStudio and follow the instructions to get started on this weeks lab

Formats .html | .qmd

002 - Validation

  1. Creating a training and validation data set from your observations dataframe in R
  2. Writing a function to iterate over multiple models to test and compare MSEs

Download: This zip file

Formats .html | .qmd

003 - Practice using tidymodels

  1. Cleaning data quickly and efficiently with tidymodels

Formats .html

004 - Practice using tidymodels (continued)

  1. An introduction to preprocessing with tidymodels (refresher from last week)
  2. An introduction to modeling with tidymodels
  3. An introduction to resampling, model tuning, and workflows with tidymodels (will finish up next week)

005 - Summarizing tidymodels

  1. Summarizing tidymodels
  2. Combining pre-split data together and then defining a custom split

006 - Penalized regression in tidymodels + functions + loops

  1. Running a Ridge, Lasso or Elasticnet logistic regression in tidymodels.
  2. A short lesson in writing functions and loops in R)

007 - Finalizing a workflow in tidymodels: Example using a random forest

  1. Finalizing a workflow in tidymodels: Example using a random forest
  2. A short lesson in writing functions and loops in R (continued)

Prediction in the media

Misc

A funny convsersation with ChatGPT about what is real.
Parts: 1 2 3 4

Additional resources

Jobs

I wrote a very short guide to finding a job.

R

Data Science

Spatial data

About

Masters-level applied econometrics course—focusing on prediction—at the University of Oregon (EC424/524 during Winter quarter, 2025). Taught by Ed Rubin and Andrew Dickinson.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages