Predicting Wine Quality

Group Members: Noelle Ferrari and Katarina Salcedo

Motivation

Provide the Washington Winegrower's Association with a model that predicts a wine's quality score, from 1-10, based upon it's chemical composition. The goal of our model is to help increase the Washington Winegrower's preformance in competitions.

Data

Our dataset contains 6497 wines and has information on 11 chemical features of each wine. These features include fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates and alcohol.

Methodology

Red and white wine datasets were combined and two categorical columns were added stating whether the wine was red or white. Before a baseline linear regression model was run, linear and mulitcollinearity assumptions were checked and the dataset was adjusted accordingly. From the baseline model, we went through mulitiple iterations, transforming and manipulating the data after each time, to try and increase our R-squared value, lower our RMSE and ensure all p-values were significant.

Results

Our baseline model has an R-squared value of 0.290 and several insignificant p-values. However, the RMSE for train and test sets are 0.732 and 0.735 respectively, indicating that this model does a reasonable job predicting quality and the model is not over- or under- fit to the data.

Through several iterations we were able to improve the R-squared to 0.300 and reduce RMSE to 0.727 for the train set and 0.730 for the test set. All p-values are statistically significant, and there are no collinearity issues with this model. We were not able to improve the R-squared further despite repeated attempts.

Conclusions

Our final model is able to explain 30% of the variation in wine quality is a result of the wine's physiochemical properties.

Next Steps

In order to improve predictions for this dataset, we believe that either additional data is needed to refine predictions or a different modelling technique should be used instead.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
Images		Images
misc_code		misc_code
wines		wines
.gitignore		.gitignore
README.md		README.md
functions.py		functions.py
prediction_intervals.ipynb		prediction_intervals.ipynb
presentation.pptx		presentation.pptx
wine_regression_analysis.ipynb		wine_regression_analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Wine Quality

Motivation

Data

Methodology

Results

Conclusions

Next Steps

About

Releases

Packages

Contributors 2

Languages

klsalcedo/wine_analysis

Folders and files

Latest commit

History

Repository files navigation

Predicting Wine Quality

Motivation

Data

Methodology

Results

Conclusions

Next Steps

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages