Applied Data Science

Master of Management in Clinical Informatics (MMCi) Program

Course Director: Matthew Engelhard
Matt's office hours: TBD
Teaching Assistant: Andrea Burton
Andrea's office hours: TBD

Course Materials

Please review the syllabus by clicking here
Materials for each course weekend are linked in the Schedule below
We recommend you use Google Chrome when browsing this site or working in Google Colab
A glossary of terms related to the course material is available here and will be updated before each course weekend. Note that it is not necessary to memorize these terms; they are listed only for your reference.

Readings and Quizzes

There will be a brief quiz due before each weekend except weekend 6.
Each quiz (see Schedule) links to one to two articles that should be read before taking it.
All articles are also available as .pdf in the Resources section of Sakai.
Your answers must be entered in the Tests and Quizzes section of Sakai before the beginning of class.
You may take each quiz as many times as you like prior to the deadline.

Group Assignments

At the beginning of the course, you will choose one of two pathways for your group assignments.
Both pathways require four assignments in total, which will be due at the beginning of each course weekend (weekends 2-5). Each assignment relates to the application of a specific machine learning method we will study in class to a clinical or healthcare problem.
Students choosing the model development pathway will learn to work with health-related datasets and train and evaluate predictive models by modifying Python code in a series of Jupyter notebooks.
Students choosing the model evaluation pathway will learn to critically evaluate machine learning models presented in the clinical literature by rigorously analyzing a series of clinical papers.

Pathway 1: Model Development

Assignments will guide you through the model development process, from loading and preprocessing data to training and evaluating models.
Students choosing this pathway should either (a) have prior experience working in Python, or (b) have prior experience working in another scientific computing language (e.g. R, Matlab) and be willing to learn Python syntax sufficient to modify and extend code blocks in the model development assignments. If you are not sure, please take a look at the posted model development assignments to get a better feel for what will be required.
To complete the assignments, you may either install Anaconda on your computer or work in Google Colaboratory, which allows you to write and execute Python code in your browser. Working in Anaconda has a steeper learning curve compared to Colaboratory. We recommend Colaboratory for those with less experience, and Anaconda for those who are already familiar with it or feeling adventurous.
Recommended Python resources include Duke Library tutorials, the Python Crash Course book, and Google Python class.

Pathway 2: Model Evaluation

Assignments will train you to critically evaluate machine learning models presented in the clinical literature by briefly answering questions related to (a) the data source, (b) model development, (c) model evaluation, and (d) model deployment.
You will answer the same set of questions for a clinical paper in each of the following four areas:
- methods for tabular clinical data
- methods for electronic health record data
- computer vision for medical imaging
- natural language processing for clinical text
Questions are provided in the model evaluation questionnaire. Please note that not all questions apply to each paper, and a brief list of questions to omit for a given paper will be provided prior to the assignment.

Final Project

The course will culminate in a design project in which you propose to apply data science methods to a clinical topic of your choosing.
Project instructions and grading details are here.
Proposals are due before class on weekend 3, and the project is due before class on weekend 6.

Course Schedule

Weekend	Before Class	During Class	After Class
1: Evaluating Predictive Models	- Review Course Site - Read Obermeyer and Emanuel, 2016 - Read Chen and Asch, 2018 - Complete Quiz 1 in Sakai	Asynchronous - AL1: Predictive Models - AL2: Logistic Regression Saturday - Lecture: Intro to Health DS - Activity: Understanding Logistic Regression (key) - Lecture: Performance Measures	Model Development Pathway - CE1: Welcome to the Jupyter Notebook - CE2: Visualizing Features of Breast Cancer Samples Model Evaluation Pathway - Evaluate Khera et al., 2021
2: Learning in Neural Networks	- Read Engelhard et al., 2021 - Complete Quiz 2 in Sakai	Friday - Activity: Calculating Performance Measures - Lecture: Multilayer Perceptron Saturday - Activity: Understanding the MLP (key) - Lecture: The Model Development Process	Model Development Pathway - CE3: Predicting malignancy from features of breast cancer samples - CE4: Exploring Overfitting Model Evaluation Pathway - Evaluate Tomašev et al., 2019
3: Medical Image Analysis	- Read Hinton, 2018 - Read Wilson, 2019 - Complete Quiz 3 in Sakai	Asynchronous - Model Learning - AL4: Motivating CNNs - AL5: Spatial Convolution - AL6: Deep CNNs Saturday - Model Learning, in Brief - Activity: Guess & Check Regression - Medical Image Analysis	Model Development Pathway - CE5: Identifying Handwritten Digits - CE6: Better MNIST Predictions - CE7 (Optional): Transfer Learning Model Evaluation Pathway - Evaluate Esteva et al., 2017
4: Biomedical Text Processing	- Complete Final Project Proposal - Read Hirschberg and Manning, 2015 - Complete Quiz 4 in Sakai	Friday - Lecture + Activity: Protecting Against Overfitting - Lecture: Intro to NLP and Bag of Words Models - Lecture: Biomedical NLP in Practice Saturday - Discussion: Healthcare Applications of NLP - Activity: Building Text Features - Eric Poon Guest Lecture	Model Development Pathway - CE8: Text Pre-Processing - CE9: Bag of Words Models - CE10 (Optional): A Simple Word Embedding Based Model Model Evaluation Pathway - Evaluate Taggart et al., 2018
5: Working with Multi-Modal Health Data	- Watch Beede et al. - Complete Quiz 5 in Sakai	Asynchronous - Lecture: Multi-Modal Health Data - Optional Lecture: Learning Word Embeddings Saturday - Lecture: Understanding Model Predictions - Activity: Revisiting the Model Development Process - Lecture: Wrapping Up	Complete Final Project
6: Course Projects	Final Project Report	Beyond Supervised Learning	Graduate!

Name		Name	Last commit message	Last commit date
Latest commit History 447 Commits
activities		activities
data		data
lectures		lectures
notebooks		notebooks
quizzes		quizzes
rubrics		rubrics
.gitignore		.gitignore
README.md		README.md
ds_glossary.md		ds_glossary.md
final_project.md		final_project.md
model_evaluation.md		model_evaluation.md
syllabus.md		syllabus.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Applied Data Science

Course Materials

Readings and Quizzes

Group Assignments

Final Project

Course Schedule

About

Releases

Packages

Languages

mengelhard/mmci_applied_ds

Folders and files

Latest commit

History

Repository files navigation

Applied Data Science

Course Materials

Readings and Quizzes

Group Assignments

Final Project

Course Schedule

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages