Authors: Julian Daduica @jdaduica, Stephanie Ta @Stephanie-Ta, and Wai Ming Wong @waiming
This study attempts to build a classification model using a logistic regression algorithm to predict whether an online shopper will make a purchase based on their website interaction behaviour. The final classifier model achieved an accuracy of 87.6% on an unseen test dataset. Compare this to a dummy classifier model that always predicts no purchase, with an accuracy of 83.5%. While the logistic regression model performed reasonably well, it did not account for the class imbalance in the dataset, where there purchase target class was significantly less than the no purchase target class. From our logistic regression model, we identified that features PageValue and ExitRate were most important when making predictions. This can suggest that these features are the most significant when determining whether a customer will purchase or not. This model can provide insight for businesses to increase revenue by targeting and optimizing these features in marketing or sales campaigns. Further research addressing class imbalance and exploring alternative models or algorithms could improve predictions, which will increase the model’s ability for businesses to utilize.
The final report can be found here
We are using a Docker virtual container so that our computational environment is reproducible. Please ensure that Docker Desktop is running while replicating our analysis if you are using Windows or Mac.
To replicate our analysis:
- Clone this GitHub repository to your local machine and navigate to the project root.
- Launch the virtual container by running the command
docker compose up
in terminal. - To open JupyterLab, copy and paste the URL in your browser that appears in terminal that starts with
http://127.0.0.1:8888/lab?token=
. - Reset our project by removing all files generated by our analysis via the command
make clean
in terminal in the project root. - Rerun our analysis and remake the files generated by our analysis by running the command
make all
in terminal in the project root.
To exit and clean up the container:
Ctrl
+C
in terminal where you launched the container.- Run the command
docker compose rm
in terminal.
If you would like to add another package to the container, please have the following dependencies and follow the instructions below.
conda
(version 24.7.1 or higher)conda-lock
(version 2.5.7 or higher)mamba
(version 1.5.8 or higher)
- Create a new branch and add the new package with its version pinned in
environment.yaml
. - Update the
conda-linux-64.lock
file by running the commandconda-lock -k explicit --file environment.yaml -p linux-64
in terminal. - Ensure the Docker image runs properly by re-building the image locally.
- If the image runs properly, push the your changes to GitHub and the new image will be published on DockerHub automatically.
- Update
docker-compose.yml
to use the new image by changing the tag (image: stephanieta/dsci522-online-shopping-project:<tag>
). - Make a pull request to merge your changes to the
main
branch.
The code of this project licensed under the terms of the MIT license.
If re-using/re-mixing please provide attribution and link to this webpage.
The project report is under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license.