This document outlines the improvements made to the project based on feedback received, including references to specific evidence such as commits, pull requests, or lines of code. Each section includes narration to help identify how the changes address the feedback.
Feedback Description:
No link was provided in the README file to the analysis results.
Changes Made:
Updated the README file to include a link to the analysis HTML file hosted on GitHub Pages:
https://ubc-mds.github.io/customer-term-deposits-predictor/analysis/customer-term-deposits-predictor.html.
Changed the GitHub Pages settings to "root" to directly specify the HTML file in the analysis folder.
Evidence:
- Commit Message: Link added
Feedback Description:
The doc
folder was unnecessary after updating GitHub Pages to serve from the "root."
Changes Made:
Deleted the doc
folder from the repository.
Evidence:
- Commit Message: Removed the doc folder
Feedback Description:
No Creative Commons License was specified for the project report, as noted in the Milestone 1 feedback.
Changes Made:
Added a Creative Commons License to the project repository. Followed an example license from Tiffany's GitHub repository.
Evidence:
- Commit Message: Link to commit updating license
Feedback Description:
Milestone 1 feedback highlighted a violation of the "golden rule" by performing EDA before splitting the dataset, potentially causing data leakage.
Changes Made:
Refactored the workflow to ensure EDA is performed only on the training dataset after the data split.
Evidence:
- Commit Message: Refactored EDA to prevent data leakage
Feedback Description:
The email address under the "Enforcement" section of the Code of Conduct should be tied to the team.
Changes Made:
Updated the Code of Conduct to include a team email under the "Enforcement" section.
Evidence:
- Commit Message: Add Code of Conduct
Feedback Description:
The script attribute naming convention in download_customer_data.py
was not descriptive enough. Attribute names only listed different paths, which lacked clarity. Milestone 1 feedback suggested improving the naming of attributes passed to the script with more descriptive names.
Changes Made:
Updated the download_customer_data.py
script to include clear and descriptive path names for attributes. Added detailed documentation to the script for better usability and clarity.
Evidence:
- Commit Message: Improved attribute naming and added documentation
Feedback Description:
- Pinned package versions missing: Almost none of the packages in
environment.yml
were pinned with specific versions. - Platform-specific lockfile: The lockfile was created for
osx-arm64
and wasn't compatible with other platforms.
Changes Made:
- Added version pinning to all packages in
environment.yml
to ensure consistent environments across different setups. - Updated the lockfile to support multiple platforms (e.g., Linux, Windows, and macOS).
Evidence:
- Commit Message: Pinned package versions and updated lockfile
Feedback Description:
The file bank-full.csv
was standalone and not categorized into either the processed
or raw
folder.
Changes Made:
Moved bank-full.csv
into the raw
folder, as it represents raw input data.
Evidence:
- Commit Message: Moved bank-full.csv
Feedback Description:
Adding docstrings for each function would make the code easier to understand and use.
Evidence:
- src folder link: link to src folder with all functions and their docstrings
Feedback Description: Consider adding checks for common issues in scripts, such as missing input files or directories.
Changes Made: This was completed as part of the Milestone 4 requirements. Tests were added to a tests file for all functions.
Evidence:
- test folder link: link to test folder
Feedback Description: README instructions doesn't run validate.py. A person who is trying to reproduce this analysis should run the same validation to ensure input data is correct.
Changes Made: The validation script was changed into a function and incorporated into the preprocessed script for better code flow
Evidence:
- Commit message: Validate function called in preprocessed script
Feedback Description: Wasn't able to render to report using the container environment.
Changes Made: The README file was updated to provide more specific details to run the environment in the docker container.
Evidence:
- Commit message: README file instructions updated
- Updated README file with a direct link to the analysis results.
- Deleted unnecessary
doc
folder after configuring GitHub Pages to serve from the "root." - Added a Creative Commons License to the repository.
- Addressed data leakage by performing EDA only on training data post-split.
- Updated the Code of Conduct to include a team email under the "Enforcement" section.
- Improved the
download_customer_data.py
script by making attribute names more descriptive and adding documentation. - Fixed environment configuration by pinning package versions and creating a platform-compatible lockfile.
- Categorized
bank-full.csv
into theraw
folder and updated scripts and documentation accordingly.
If any feedback was partially addressed or pending, please indicate the status and next steps here.