By Filipa Calado
Pratt Institute School of Information
This website offers a series of Python lessons organized into different sections, like "Introduction to Python Fundamentals" and "Python for Data Cleaning". These materials introduce participants to the Python programming language for working in cultural heritage contexts like libraries, archives, and museums.
1: “Introduction to Python Fundamentals”
- Offers basic introduction to core concepts in Python programming, grounded in a critical awareness about data and what happens to data at various levels of transformation and abstraction.
2: “Python for Web Scraping and APIs”
- Introduction to ethics, legality, and programmatic methods for extracting data from the web. Advances core concepts from introductory session (like loops and conditional statements) and adds new concepts on object-oriented programming and working with Python libraries. Participants practice scraping metadata from current “anti-trans” bills in the USA.
- libraries:
requests
,bs4
, andpandas
3: “Python for Data Cleaning”
- Experiments with approaches for wrangling text data into formats for analysis, with emphasis on removing unwanted elements that may skew analysis. While building on skills for writing loops and conditional statements and working with external libraries, participants will learn to write functions and scripts for running customized text cleaning processes.
- libraries:
pandas
,spacy
4: “Python for Data Analysis”
- Explores methods for finding and analyzing textual patterns through popular tasks in Natural Language Processing. Participants practice writing code to annotate and extract text according to specific features from current “anti-trans” bills in the USA.
- libraries:
spaCy
5: “Python for Machine Learning”
- With the anti-trans bills data that they prepared in previous workshops, participants practice fine-tuning a small Text Generation model and learn about how to use Machine Learning for research.
- libraries:
transformers
**6: "Python for Publishing"
- For this session, we will learn Jekyll and Github Pages to deploy your project into a website that others can access on the internet.
This curriculum is inspired by the Graduate Center Digital Initiatives Digital Humanities Research Institute Python workshop.
The opening challenge takes text from the Feminist Data Manifest-No by M. Cifor, P. Garcia, et al.
For more instruction with Python, please see these books:
- WJB Mattingly's Introduction to Python for Humanists
- Melanie Walsh's Introduction to Cultural Analytics & Python
All of the above workshops were first developed and piloted at the Princeton University Library in the 2023-2024 academic year. Thank you to Princeton students, faculty, and staff for their generous participation and suggestions.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.