IMPORTANT NOTE: This repo relies on loading data from the ONS Open Geography Portal. Changes to that service may break these scripts.
This repository contains a number of scripts and source files for processing geographic data files for Census maps, Build a custom area profile, Explore local statistics and other products maintained by the ONS Data Visualisation team. The scripts also generate vector map tiles that are used across a number of other products.
The scripts generate the following output file types:
- JSON files (per area) with geometry, metadata and parent/child relationships for individual areas.
- Lookups from Census 2021 Output Areas to larger geographic areas.
- CSV tables of codes and names for area search/select in the above products.
- Vector tiles for various geography types.
The scripts in this repo have dependencies that will only run on a Mac or Linux system (including Windows Subsystem for Linux). You will need to install the following dependencies in order to run the scripts:
- NodeJS
- GDAL (included with QGIS)
- Tippecanoe
In addition, the scripts use commands including curl, zip, gzip and rm, which should be available on virtually any MacOS or Linux-based system.
In order to run the scripts in this repo, you will first need to install the NodeJS depenencies:
npm install
You can then run each of the scripts below using the following command:
npm run {script-name}
It is important that the first three scripts are run in order before running the rest of the scripts, because they download and pre-process the necessary source files.
Alternatively, you can run all of the scripts - other than the vector tiles scripts - by running the following command (this may take around 30 minutes in total):
npm run all-scripts
This script will download and transform boundary and lookup files from the ONS Open Geography Portal, as well as MSOA Names from the House of Commons Library. The config can be found at /config/source-files.js
.
This script generates a lookup file at /config/lookup/lookup_metadata.csv
containing area codes and names extracted from the downloaded geographic boundary files. It also adds unofficial names from the MSOA names CSV file, and calculates start and end dates for new and terminated geographies.
This script derives parent geographies and successor geographies (for terminated areas) based on a point-in-polygon method. Calculating these relations "on the fly" avoids the need for a large number of external lookup files. The output from this script is merged with the metadata generated in the previous step, and outputs the file /config/lookup/lookup.csv
.
This script generates CSV lists of codes and names for places that can be found via search in Build a custom area profile and Explore local statistics. The metadata is extracted from the lookup file generated by the make-relations
script.
This script calculates accurate best-fit lookups for every geography based on Output Area, LSOA and MSOA populated-weighted centroids. These lookups are required for the Build a custom area profile tool.
This script generates the geography files required for Census maps (in the /output/cm-geos
folder), and a common format shared by Build a custom area profile and Explore local statistics (in the /output/geos
folder). It also generates name/code list CSVs for the latter two products in the /output
folder. The config can be found at /config/geo-config.js
.
An example geo file for Census maps can be found here. And example file for the other products can be found here.
Important! Make sure that the /output/geos
folder is empty before running this script.
Note: The .json output files in the output folder are actually gzipped, so cannot be opened directly. If you want to inspect their contents, you need to add .gz to their filename, and then gunzip them.
This optional script can be run to generate geography metadata files for Census maps. It relies on the output of the make-geos
script.
This script generates vector tiles for the most commonly used smaller geography types, including local authorities, wards and statistical geographies. The output is in the form of .mbtiles files, written to the /output/vtiles
folder. The config can be found at /config/vtiles-config.js
. (Example output).
The vector tiles can be previewed by installing tileserver-gl-light and running the following command for the specific file you want to preview:
tileserver-gl-light ./output/vtiles/{file}.mbtiles
These two scripts are used in sequence to generate vector tiles for output areas suitable for both high and low zoom levels, specifically designed for use in the original version of Build a custom area profile. (Example output).
Whereas Tippecanoe is capable of merging together small areas to cater to lower zoom levels, these scripts explicitly merge the smallest OAs into their LSOA and MSOA parents, which is preferable both visually and in terms of the functionality of the above product.
Note: These tiles can also be used for standard map visualisations in the same way as the other vector tiles generated by the make-vtiles script.
This script unpacks the .mbtiles
files in the ./output/vtiles into a directory structure using Tippecanoe's tile-join command, and then zips the directories into the same output folder.
These ZIP files are suitable for uploading and unzipping to serve from a static file service such as AWS S3.
Note: Serving from the .mbtiles files directly would require a vector tiles server.
This script takes the National Statistic Postcode Lookup as an input and generates a series of JSON files in the form /P.json, /PO.json, /PO1.json, /PO14.json etc that can be used for autocomplete postcode searches. The files contain postcodes with those prefixes and their lng/lat centroid coordinates.
The 4-digit files contain all postcodes with that prefix. The 1, 2 and 3-digit files contain only the first 10 postcodes that match, which are provided for the purposes of autocomplete while typing.
This script generates a directory of tiled GeoJSON files that can be used for point-in-polygon lookup. The directory structure follows a {x}/{y}.json pattern, equivalent to zoom level 12 web map tile coordinates.
All of the config files for these scripts can be found within the /config
folder. The files that need to be updated with each geography update are:
- source-files.js - Defines the files to download from the ONS Geoportal.
- geo-config.js - Defines the geography types and which years they cover.
- vtiles-config.js - Defines the vector tiles outputs.
At the moment, the best way to create partial outputs from the scripts is to comment out lines in geo-config.js and/or vtiles-config.js (eg. to generate vector tiles for only one geography type).
It should be possible to add additional geographies or years by modifying the config files, without the need to edit the script files. You will typically need to add multiple entries to /config/source-files.js
(oa/parent lookups, names and boundaries) for each new geography/year, as well as updating the /config/geo-config.js
and /config/vtiles-config.js
files.
If you do need to edit the processing scripts, either to fix bugs or add features, you can find these in the /scripts
folder. A number of shared functions are included in the /scripts/utils.js
file.
There are a number of features that could be added to these scripts in future. These include:
- Generating the raster mask map tiles used in small area mapping products (such as Census maps).
- Calculating neighbours and related geographies based on boundary files.
- Adding various additional metadata to the geo files, including neighbours and codes to allow linkages to other products and API data sources.