The goal of this project is to determine the reliability of hard drive models and create a model that predicts whether or not a hard drive will fail early. The stakeholders include users of the specific hard drive models in our dataset as well as consumers in the market for a new hard drive. Our project deliverables include a jupyter notebook detailing our process, a handout with best and worst hard drives, and a slide show presentation summarizing our findings for a general audience.
Raw data from 2016 to 2019 is acquired from Backblaze.com, totaling over 34 gigabytes. The data is aggregated by serial number. The max values of the top five identified SMART stats for hard drive failures are retrieved for each unique serial number.
Access the aggegrated data: Hard Drive Data
- The 5 SMART (Self-Monitoring, Analysis and Reporting Technology) stats identified will indicate device hard drive failure
- Hard drive fail rates will vary by model
- Hard drives fail rates will vary by manufacturer
Document containg common terminology used throughout the project, definitions, and notes: Data dictionary link
Index containing SMART ID number, Attribute Name, and Description: SMART Attribute Index
Tailored for a general audience: Project Slides