authors: Shannon Pflueger, Nelli Hovhannisyan, Joseph Lim
In this project we are comparing multiple classification models to predict pregnant women's maternal health risk as low, medium or high from their health data. With our chosen model we aim to identify some key indicators that predict higher maternal health risk. Maternal health broadly refers to overall health of women during their pregnancy, child birth and their post-natal period (World Health Organization, 2024). A variety of complications can arise during pregnancy, childbirth and soon after that result in maternal death. The World Health Organization (1992) defines maternal mortality as "the death of a woman whilst pregnant or within 42 days of delivery or termination of pregnancy, from any cause related to, or aggravated by pregnancy or its management, but excluding deaths from incidental or accidental causes". Thus, maternal health risk refers to the approximate risk level of a woman's health while pregnant or soon after birth. With our chosen model we aim to identify some key indicators that predict higher maternal health risk. In rural communities where it is costly and difficult to provide consistent medical care, having a method to predict maternal health risk from minimally invasive methods could be greatly beneficial in improving health outcomes for mothers and babies alike. After some initial exploration of classification models we settled on the Decision Tree algorithm for its easily interpretable model and relatively high accuracy score. Based on this Decision Tree model a key indicator of increased maternal health risk was blood sugar. The decision tree our model built based on the dataset suggests that blood sugar higher than 7.95 mmol/l is correlated with a high maternal health risk. However, the model does seem to struggle with classifying low vs. medium maternal health risk. While this is a great first step the model accuracy is not accurate enough to be used in a medical context yet. More research and modelling (perhaps with different models) should be done before deploying this technology in rural communities. Additionally, it should be noted that since this dataset contains health data only from Pima Indians the model have may learned a bias specific to unknown genetic factors present in this sample. Thus, a much larger dataset with a diverse sample should be used to train the model before it is utilized in any communities.
The dataset used in this project was originally from the Pima Indians Diabetes Database (Pima Indians Diabetes Database, 2024). The dataset was sourced from the UCI Machine Learning Repository (Dua and Graff 2017) and can be found here, or more specifically this file. Classification for each observation in the dataset was done with help from Dr. Shirin Shabnam (Ahmed, Kashem, Rahman, & Khatun, 2020).
A link to our report can be found here
If you are using Windows or Mac, make sure Docker Desktop is running.
- Clone this GitHub repository (git clone repo URL)
- Navigate to the root of this project on your computer using the command line and enter the following command:
docker compose up --build
-
In the terminal, look for a URL that starts with
http://127.0.0.1:8888/lab?token=
-
From the root of the project run the following commands to reset the project to a clean state (removing all files generated by previous runs of the analysis):
make clean
- To run the analysis from start to end, run the following command in the terminal in the project root:
make all
- To shut down the container and clean up the resources,
type
Cntrl
+C
in the terminal where you launched the container, and then typedocker compose rm
conda
(version 23.9.0 or higher)conda-lock
(version 2.5.7 or higher)
-
Add the dependency to the
environment.yaml
file on a new branch. -
Run
conda-lock -k explicit --file environment.yaml -p linux-64
to update theconda-linux-64.lock
file. -
Re-build the Docker image locally to ensure it builds and runs properly.
-
Push the changes to GitHub. A new Docker image will be built and pushed to Docker Hub automatically. It will be tagged with the SHA for the commit that changed the file.
-
The
docker-compose.yml
file will be updated automatically with GitHub Actions. -
Send a pull request to merge the changes into the
main
branch.
Use the same docker compose up
command as described in the Running the analysis section above
to launch Jupyter lab, if it's not already launched.
Tests are run using the pytest
command in the root of the project.
More details about the test suite can be found in the
tests
directory.
The Maternal Health Risk report contained herein are licensed under the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License. See the license file for more information. . If re-using/re-mixing please provide attribution and link to this webpage. The software code contained within this repository is licensed under the MIT license. See the license file for more information.
World Health Organization. (2024, December 7). Maternal health. World Health Organization. https://www.who.int/health-topics/maternal-health#tab=tab_1
World Health Organization. (1992). International classification of diseases and related health problems. World Health Organization.
Pima Indians Diabetes Database. (2024, November 22). Pima Indians Diabetes Dataset. Retrieved from https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.names
Dua, D., & Graff, C. (2017). UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml
Ahmed, M., Kashem, M. A., Rahman, M., & Khatun, S. (2020). Review and analysis of risk factors of maternal health in remote areas using the Internet of Things (IoT). Lecture Notes in Electrical Engineering, 632, 1-10. https://doi.org/10.24432/C5DP5D