This repository contains the materials for D-Lab’s Python SQL Intermediate workshop.
We recommend completing SQL Fundamentals or having equivalent experience with:
- Basic SQL syntax (
SELECT
,FROM
,WHERE
) - Reading and querying single tables
Check out D-Lab’s Workshop Catalog to browse all workshops, see what’s running now, and review prerequisites.
This intermediate SQL workshop builds on foundational skills to deepen your ability to work with relational databases in Python. We focus on more advanced querying techniques essential for real-world data analysis. You’ll gain hands-on experience with multi-table operations, subqueries, and window functions. These are key tools for structuring powerful and efficient queries.
After completing SQL Intermediate, you will be able to:
- Combine data from multiple tables using different types of JOINs (INNER, LEFT, SELF).
- Understand the role of primary and foreign keys in establishing table relationships.
- Write and use subqueries to break down complex queries with multiple logical steps.
- Simplify complex queries using Common Table Expressions (CTEs).
- Transform data between row and column orientations with pivoting and unpivoting techniques.
- Apply window functions to perform calculations across specified sets of rows.
This workshop does not cover:
- Basics of SQL (see SQL Fundamentals
- Interfacing with cloud databases (see Cloud SQL Databases)
- SQL performance optimization and indexing
SQL Intermediate is a 2-hour workshop. It follows a lecture-style coding walkthrough with interspersed challenge problems and a short break. Instructors and TAs are available throughout to guide your learning and answer questions in accessible language.
Before attending the workshop, you should install Python and Jupyter to your computer. If you need help, please submit a consulting request with D-Lab prior to the start of the workshop.
Then follow the steps in the installation instructions notebook
If you do not have Anaconda installed and the materials loaded on your workshop by the time it starts, we strongly recommend using the UC Berkeley Datahub to run the materials for these lessons. You can access the DataHub by clicking this button:
The DataHub downloads this repository, along with any necessary packages, and allows you to run the materials in a Jupyter notebook that is stored on UC Berkeley's servers. No installation is necessary from your end - you only need an internet browser and a CalNet ID to log in. By using the DataHub, you can save your work and come back to it at any time. When you want to return to your saved work, just go straight to DataHub, sign in, and you click on the SQL-Fundamentals
folder.
If you don't have a Berkeley CalNet ID, you can still run these lessons in Binder, which is another cloud-based option. Click this button:
Note: Using Binder, you unfortunately cannot save your work.
D-Lab works with Berkeley faculty, research staff, and students to advance data-intensive social science and humanities research. Our goal at D-Lab is to provide practical training, staff support, resources, and space to enable you to use R for your own research applications. Our services cater to all skill levels and no programming, statistical, or computer science backgrounds are necessary. We offer these services in the form of workshops, one-to-one consulting, and working groups that cover a variety of research topics, digital tools, and programming languages.
Visit the D-Lab homepage to learn more about us. You can view our calendar for upcoming events, learn about how to utilize our consulting and data services, and check out upcoming workshops.
Here are other Python workshops offered by the D-Lab:
- Python Fundamentals
- Python GPT Fundamentals
- Python Data Wrangling
- Python Data Visualization
- Geospatial Fundamentals in Python
- Bruno Smaniotto
- Tom van Nuenen