This project is centered around exploratory data analysis (EDA) techniques and statistical analysis, as well as modeling data using linear regression.
There are two topics available:
-
King County Housing Data: This dataset contains information about home sales in King County (USA).
-
US Bank Wages: This dataset contains information about the wages of employees of a US bank.
No matter which dataset you choose, your task will be to perform an extensive EDA and to train a simple linear regression model. For a more detailed task description have a look at the assignment.
Please create a new repository for this project. Make sure you also create and activate a new virtual environment inside your project repo. In this environment you can install all the packages you will use during your project using pip.
pyenv local 3.8.5
python -m venv .venv
source .venv/bin/activate