CHANGELOG

This document outlines the improvements made to the project based on feedback received, including references to specific evidence such as commits, pull requests, or lines of code. Each section includes narration to help identify how the changes address the feedback.

Feedback Summary

1. Update README File

Feedback Description:
No link was provided in the README file to the analysis results.

Changes Made:
Updated the README file to include a link to the analysis HTML file hosted on GitHub Pages:
https://ubc-mds.github.io/customer-term-deposits-predictor/analysis/customer-term-deposits-predictor.html.
Changed the GitHub Pages settings to "root" to directly specify the HTML file in the analysis folder.

Evidence:

Commit Message: Link added

2. Delete `doc` Folder

Feedback Description:
The doc folder was unnecessary after updating GitHub Pages to serve from the "root."

Changes Made:
Deleted the doc folder from the repository.

Evidence:

Commit Message: Removed the doc folder

3. Update License File

Feedback Description:
No Creative Commons License was specified for the project report, as noted in the Milestone 1 feedback.

Changes Made:
Added a Creative Commons License to the project repository. Followed an example license from Tiffany's GitHub repository.

Evidence:

Commit Message: Link to commit updating license

4. Address Data Leakage in EDA

Feedback Description:
Milestone 1 feedback highlighted a violation of the "golden rule" by performing EDA before splitting the dataset, potentially causing data leakage.

Changes Made:
Refactored the workflow to ensure EDA is performed only on the training dataset after the data split.

Evidence:

Commit Message: Refactored EDA to prevent data leakage

5. Add Code of Conduct

Feedback Description:
The email address under the "Enforcement" section of the Code of Conduct should be tied to the team.

Changes Made:
Updated the Code of Conduct to include a team email under the "Enforcement" section.

Evidence:

Commit Message: Add Code of Conduct

6. Fix `download_customer_data.py` Script

Feedback Description:
The script attribute naming convention in download_customer_data.py was not descriptive enough. Attribute names only listed different paths, which lacked clarity. Milestone 1 feedback suggested improving the naming of attributes passed to the script with more descriptive names.

Changes Made:
Updated the download_customer_data.py script to include clear and descriptive path names for attributes. Added detailed documentation to the script for better usability and clarity.

Evidence:

Commit Message: Improved attribute naming and added documentation

7. Fix Environment Configuration

Feedback Description:

Pinned package versions missing: Almost none of the packages in environment.yml were pinned with specific versions.
Platform-specific lockfile: The lockfile was created for osx-arm64 and wasn't compatible with other platforms.

Changes Made:

Added version pinning to all packages in environment.yml to ensure consistent environments across different setups.
Updated the lockfile to support multiple platforms (e.g., Linux, Windows, and macOS).

Evidence:

Commit Message: Pinned package versions and updated lockfile

8. Categorize `bank-full.csv` into Processed or Raw Folder

Feedback Description:
The file bank-full.csv was standalone and not categorized into either the processed or raw folder.

Changes Made:
Moved bank-full.csv into the raw folder, as it represents raw input data.

Evidence:

Commit Message: Moved bank-full.csv

8. Add function docstring

Feedback Description:
Adding docstrings for each function would make the code easier to understand and use.

Evidence:

src folder link: link to src folder with all functions and their docstrings

9. Error handling

Feedback Description: Consider adding checks for common issues in scripts, such as missing input files or directories.

Changes Made: This was completed as part of the Milestone 4 requirements. Tests were added to a tests file for all functions.

Evidence:

test folder link: link to test folder

10. Validation script

Feedback Description: README instructions doesn't run validate.py. A person who is trying to reproduce this analysis should run the same validation to ensure input data is correct.

Changes Made: The validation script was changed into a function and incorporated into the preprocessed script for better code flow

Evidence:

Commit message: Validate function called in preprocessed script

11. Quarto Render issues in Docker

Feedback Description: Wasn't able to render to report using the container environment.

Changes Made: The README file was updated to provide more specific details to run the environment in the docker container.

Evidence:

Commit message: README file instructions updated

Summary of Improvements

Updated README file with a direct link to the analysis results.
Deleted unnecessary doc folder after configuring GitHub Pages to serve from the "root."
Added a Creative Commons License to the repository.
Addressed data leakage by performing EDA only on training data post-split.
Updated the Code of Conduct to include a team email under the "Enforcement" section.
Improved the download_customer_data.py script by making attribute names more descriptive and adding documentation.
Fixed environment configuration by pinning package versions and creating a platform-compatible lockfile.
Categorized bank-full.csv into the raw folder and updated scripts and documentation accordingly.

Additional Notes

If any feedback was partially addressed or pending, please indicate the status and next steps here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHANGELOG.md

CHANGELOG.md

CHANGELOG

Feedback Summary

1. Update README File

2. Delete `doc` Folder

3. Update License File

4. Address Data Leakage in EDA

5. Add Code of Conduct

6. Fix `download_customer_data.py` Script

7. Fix Environment Configuration

8. Categorize `bank-full.csv` into Processed or Raw Folder

8. Add function docstring

9. Error handling

10. Validation script

11. Quarto Render issues in Docker

Summary of Improvements

Additional Notes

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

CHANGELOG

Feedback Summary

1. Update README File

2. Delete doc Folder

3. Update License File

4. Address Data Leakage in EDA

5. Add Code of Conduct

6. Fix download_customer_data.py Script

7. Fix Environment Configuration

8. Categorize bank-full.csv into Processed or Raw Folder

8. Add function docstring

9. Error handling

10. Validation script

11. Quarto Render issues in Docker

Summary of Improvements

Additional Notes

2. Delete `doc` Folder

6. Fix `download_customer_data.py` Script

8. Categorize `bank-full.csv` into Processed or Raw Folder