South Africa Street History Mapping

We were curious about visualising how street names correlate to their language or place of origin in South Africa - a country whose history is marked by significant power struggles and complex race relations. This repo provides the code for creating maps of street networks colour coded by place of origin and language.

This readme is divided into:

Results
Academic Outputs
Running the code
Helpful Notebooks
Contact

Results

Area	Dictionary Lookup	Language Detector
Johannesburg
Soweto
Sandton
Cape Town

Academic Outputs

Poster for IC2S2 2024
Paper coming soon!

Running the code

This section describes how to run the code. Feel free to open an issue if you have any questions!

Prerequisites

If windows, git bash
Docker
Poetry
~5GB Disk Space (Docker images + data)

Setup

Poetry is used to manage packages and virtual environments.

 poetry shell

 poetry install

Data Download Pipeline

1. Retrieve streets for relevant countries

Core Code: download_country_streets.py

We first need to download all street names for South Africa and selected countries that have played a role in South Africa's history (see countries). Data is downloaded using the Overpass API - an API that retrieves data easily from OpenStreetMaps.

To retrieve street data, check that you're happy with what countries are being retrieved and run:

python ./src/street_list_download/main.py

If you're on a slurm enabled cluster, you can run

sbatch ./scripts/1_retrieve-streets.sbatch

The outputs of this script are saved to streets in CSV format.

2. Process street data

Core Code: preprocess_country_streets.py

We now process the street names for the various countries so that we end up with a dictionary of terms for the country. Each street name is:

Exploded by space (e.g. so that Nottingham Road becomes [Nottingham, Road])
Converted to lowercase

This results in a dataframe of terms. Empty, NaN, digit, and duplicate terms are dropped. Words less than a certain length are also dropped.

To process the street data, run:

python ./src/street_list_preprocessing/main.py

If you're on a slurm enabled cluster, you can run

sbatch ./scripts/2_process-streets.sbatch

The outputs of this script are saved to streets in CSV format with the prefix "processed". Additionally, all terms and the corresponding origin country are saved to a sqlite database in output/street_history.sqlite in the table street_terms.

3. Build Dictionary

Core Code: build_dictionary_for_term.py

Now that we have all the terms for each selected country, we can build a lookup dictionary for each term for a "home" country. In our case, South Africa is the home country.

For each term in South Africa's terms data from the previous step, the term is looked up in the street_terms table. If the term is matched to one or more countries (including in the home country), the term is saved in a dictionary table and assigned a likelihood based on the frequency of the term appearing in different countries.

The term, origin, and likelihood are saved to a sqlite database in output/street_history.sqlite in a table with the format <country>_terms_dictionary.

To build a dictionary of terms for a specific country, run:

python ./src/dictionary_builder/main.py $COUNTRY

Where $COUNTRY is south_africa in the case of this repo but could be modified to other countries that have been downloaded.

If you're on a slurm enabled cluster, you can run:

sbatch ./scripts/3_build-dictionary-south-africa.sbatch

4. Map

Finally, we can now map street names for a particular area in the "home" country. To do this, OSMNX is used to retrieve a street network graph for an area. The street names in the network are preprocessed to produce terms for each name. The terms are looked up in the dictionary and the term with the highest likelihood origin is used to set the origin (excluding "stop" words like road, avenue, etc). The street is then mapped with a colour coding matching the allocated origin.

Additionally, an option is included to instead map the streets by language which needs some further work but produces interesting results. This second mapping uses lingua to detect the language of the terms provided.

To map all street names in a region, run the end-to-end mapping - e.g.:

python ./src/mapping/map-e2e.py "Johannesburg, South Africa" --distance 30000 --fig_size 64

Helpful Notebooks

There are a bunch of Jupyter notebooks in the notebooks folder which may be useful for you to play around with.

Contact

Feel free to reach out to me either via this repo or [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
data		data
images		images
notebooks		notebooks
output		output
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

South Africa Street History Mapping

Results

Academic Outputs

Running the code

Prerequisites

Setup

Data Download Pipeline

1. Retrieve streets for relevant countries

2. Process street data

3. Build Dictionary

4. Map

Helpful Notebooks

Contact

About

Releases

Packages

Languages

License

Emily-RoseSteyn/south-africa-street-history-mapping

Folders and files

Latest commit

History

Repository files navigation

South Africa Street History Mapping

Results

Academic Outputs

Running the code

Prerequisites

Setup

Data Download Pipeline

1. Retrieve streets for relevant countries

2. Process street data

3. Build Dictionary

4. Map

Helpful Notebooks

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages