Welcome to Economics 524 (424): Prediction and machine-learning in econometrics, taught by Ed Rubin and Andrew Dickinson.
Lecture Tuesdays and Thursdays, 10:00a-11:20a (Pacific), 101 McKenzie
Lab Friday, 2:00p–2:50p (Pacific), 195 Anstett
Office hours
- Ed Rubin Tu. 2:30p–3:30p (PLC 530)
- Andrew Dickinson We. 3p–4p (Zoom)
- R for Data Science
- Introduction to Data Science (not available without purchase)
- The Elements of Statistical Learning
- Data Science for Public Policy (ebook available through UO library)
Note: Links to topics that we have not yet covered lead to older slides. I will update links to the new slides as we work our way through the term/slides.
- Why do we have a class on prediction?
- How is prediction (and how are its tools) different from causal inference?
- Motivating examples
Readings Introduction in ISL
001 - Statistical learning foundations
- Why do we have a class on prediction?
- How is prediction (and how are its tools) different from causal inference?
- Motivating examples
Readings
- Prediction Policy Problems by Kleinberg et al. (2015)
- ISL Ch1
- ISL Start Ch2
Supplements Unsupervised character recognization
- Model accuracy
- Loss for regression and classification
- The variance-bias tradeoff
- The Bayes classifier
- KNN
Readings
- ISL Ch2–Ch3
- Optional: 100ML Preface and Ch1–Ch4
- Review
- The validation-set approach
- Leave-out-out cross validation
- k-fold cross validation
- The bootstrap
Readings
- ISL Ch5
- Optional: 100ML Ch5
004 - Linear regression strikes back
- Returning to linear regression
- Model performance and overfit
- Model selection—best subset and stepwise
- Selection criteria
Readings
- ISL Ch3
- ISL Ch6.1
In between: tidymodels
-ing
- An introduction to preprocessing with
tidymodels
. (Kaggle notebook) - An introduction to modeling with
tidymodels
. (Kaggle notebook) - An introduction to resampling, model tuning, and workflows with
tidymodels
(Kaggle notebook) - Introduction to
tidymodels
: Follow up for Kaggle
(AKA: Penalized or regularized regression)
- Ridge regression
- Lasso
- Elasticnet
Readings
- ISL Ch4
- ISL Ch6
- Introduction to classification
- Why not regression?
- But also: Logistic regression
- Assessment: Confusion matrix, assessment criteria, ROC, and AUC
Readings
- ISL Ch4
- Introduction to trees
- Regression trees
- Classification trees—including the Gini index, entropy, and error rate
Readings
- ISL Ch8.1–Ch8.2
- Introduction
- Bagging
- Random forests
- Boosting
Readings
- ISL Ch8.2
- Hyperplanes and classification
- The maximal margin hyperplane/classifier
- The support vector classifier
- Support vector machines
Readings
- ISL Ch9
010 - Dimensionality reduction and unsupervised learning
- MNIST dataset (machines with vision)
- K-means clustering
- Principal component analysis (PCA)
- UMAP
Past, present, and future projects.
000 Predicting sales price in housing data (Kaggle)
Due: Friday 31 January 2025 by midnight (before 11:59 PM) Pacific
Help:
- A simple example/walkthrough
- Kaggle notebooks (from Connor Lennon)
001 Validation and out-of-sample performance
Due: Thursday 13 February 2025 by midnight (before 11:59 PM) Pacific
002 Penalized regression, logistic regression, and classification
003 Trees, ensembles, and imputation
Topic due by midnight on 09 February 2025.
Final project submission due by 11:59p on 12 March 2025.
In-class exam: Monday (17 March 2025) at 10:15a–12:15p
Note: Previous years had a take-home portion of the final exam. This year, we will only have an in-class exam.
Prep materials
Previous take-home exam: 2023 | 2024
Previous in-class exams: 2023 | 2024
Note: I am not providing keys.
Approximate/planned topics...
- General "best practices" for coding
- Working with RStudio
- The pipe (
%>%
) - Cleaning and Kaggle follow up
001 - Workflow and cleaning: An example
Follow these steps to get started on the lab this week.
- Install Quarto. Follow this link, download the installer for your operating system, and follow the instructions to install Quarto
- Download (and unzip) the Housing data and the Quarto document (download button top right corner of page)
- Create a project in RStudio in a separate folder
- Copy/move the data files and the Quarto document to a folder dedicated to this lab
- Open the Quarto document in RStudio and follow the instructions to get started on this weeks lab
- Creating a training and validation data set from your observations dataframe in R
- Writing a function to iterate over multiple models to test and compare MSEs
Download: This zip file
003 - Practice using tidymodels
- Cleaning data quickly and efficiently with
tidymodels
Formats .html
004 - Practice using tidymodels
(continued)
- An introduction to preprocessing with
tidymodels
(refresher from last week) - An introduction to modeling with
tidymodels
- An introduction to resampling, model tuning, and workflows with
tidymodels
(will finish up next week)
005 - Summarizing tidymodels
- Summarizing
tidymodels
- Combining pre-split data together and then defining a custom split
006 - Penalized regression in tidymodels
+ functions + loops
- Running a Ridge, Lasso or Elasticnet logistic regression in
tidymodels
. - A short lesson in writing functions and loops in R)
007 - Finalizing a workflow in tidymodels
: Example using a random forest
- Finalizing a workflow in
tidymodels
: Example using a random forest - A short lesson in writing functions and loops in R (continued)
- NPR: Google's new AI chatbot made a $100 billion mistake in a demo ad
- NYT: Disinformation Researchers Raise Alarms About A.I. Chatbots
- NPR: She was denied entry to a Rockettes show — then the facial recognition debate ignited
- LA Times: Nobody knows how widespread illegal cannabis grows are in California. So we mapped them
- NYT: Can A.I. Write Recipes Better Than Humans? We Put It to the Ultimate Test
- ChatGPT
- Business Insider: List of exams ChatGPT has passed
- NPR: 'Everybody is cheating': Why this teacher has adopted an open ChatGPT policy
- How Should Schools Respond to ChatGPT?
- Energy Institute: Can ChatGPT Save the Planet?
- MIT Tech Review: Here’s how Microsoft could use ChatGPT
- NPR: This 22-year-old is trying to save us from ChatGPT before it changes writing forever
- NYT: How ChatGPT Hijacks Democracy
- NYT: Don’t Ban ChatGPT in Schools. Teach With It.
- NYT: How to Use ChatGPT and Still Be a Good Person
- NPR: A new AI chatbot might do your homework for you. But it's still not an A+ student
- NYT: The Brilliance and Weirdness of ChatGPT
- Military applications
A funny convsersation with ChatGPT about what is real.
Parts: 1 2 3 4
I wrote a very short guide to finding a job.
- UO library resources/workshops
- RStudio's recommendations for learning R, plus cheatsheets, books, and tutorials
- YaRrr! The Pirate’s Guide to R (free online)
- Eugene R Users
- Happy Git and GitHub for the useR by Jenny Bryan, the "STAT 545 TAs", and Jim Hester
- Python Data Science Handbook by Jake VanderPlas
- Elements of AI
- Caltech professor Yaser Abu-Mostafa: Lectures about machine learning on YouTube
- From Google:
- Geocomputation with R (free online)
- Spatial Data Science (free online)
- Applied Spatial Data Analysis with R