Introduction to Web Scraping with Python (bs4)

This workshop introduces web scraping with Python library bs4. It can be taught asynchronously or synchronously, as its own workshop or as part of the Python track on the digital institutes.

Credits

This workshop was written by Filipa Calado.

Workshops

It was first taught at CUNY GC by Filipa Calado in the Spring of 2021 as a two hour online syncronous workshop.

Abstract:

This workshop goes over how to web scrape using python library, Beautiful Soup 4, or bs4. In short, bs4 is a Python library for "web scraping," or pulling data out of HTML and XML files. In this workshop, we will be using bs4 to scrape news data from the New York Times website. By this end of this workshop, you will have a python script that can grab data from a website and export that data into a CSV file. Then, at the very end, I will show you a couple of other ways to scrape websites, that go beyond bs4, for scraping social media.

Requirements

Students need to be familiar with the Python language, having completed the Introduction to Python workshop before taking this workshop.
Students should install the most recent Anaconda Python distribution on their computers, as well as the python libraries requests, bs4, lxml and csv.

Reception and Feedback

Feedback was very good. Students thought the pace and content was effective.

Needed/Desired Changes

There is some interest in expanding this workshop into a two or three part web scraping series.

License

Workshop leader: Filipa Calado, Graduate Center Digital Fellows

Creative Commons Attribution-ShareAlike 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
bs4_workshop.md		bs4_workshop.md
doormouse.html		doormouse.html
frontmatter.md		frontmatter.md
inspector.png		inspector.png
nyt_links.csv		nyt_links.csv
scrape_headings.py		scrape_headings.py
scrape_to_csv.py		scrape_to_csv.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction to Web Scraping with Python (bs4)

Credits

Workshops

Requirements

Reception and Feedback

Needed/Desired Changes

License

About

Releases

Packages

Languages

gofilipa/bs4_workshop

Folders and files

Latest commit

History

Repository files navigation

Introduction to Web Scraping with Python (bs4)

Credits

Workshops

Requirements

Reception and Feedback

Needed/Desired Changes

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages