Skip to content

Latest commit

 

History

History
49 lines (29 loc) · 4.85 KB

01-Overview_Sat_Data.md

File metadata and controls

49 lines (29 loc) · 4.85 KB

Satellite Images Processing: Overview

General view on the data pipeline

Satellite platforms can acquire data with a multitude of sensors. The ones involved in the Challenge are multi-band (or multi-spectral) optical sensors. The data from satellite sensors is provided as rasters, that are matrices of georeferenced values. See visualization example here.

A general overview of the challenges involved in building data pipelines for this data are listed in this post

For an example about how fully processed data displays, you can have a look one of the different public repositories: Landsat 8 published on AWS. The only requirements to access the full dataset is to have a AWS account and create an application token to use with AWS' client boto3. The Challenge provides similar data but provided by Planet and ESA both accessible through the PlanetExplorer Web application or command-line client. This repository requires credentials that will be provided during the project duration.

Extracting (example): Landsat 8, most popular

Landsat 8 is the best dataset to learn the basics. The first example in the documentation provides this preview. The preview is just a JPEG "thumbnail" of the real deal that are the different bands listed below in the page. To have a general idea of the data, see paragraph 4.2 and below in this document.

thumbnail.

In the "Files" list you can see a list of the available files from the different bands: "BQA" and "B1" to "B10". To know on which section of the Electro-Magnetic Spectrum each band has been acquired you can read the table here:

bands table

You can find each of these bands in the "Files" list. You can download the files with the .TIF extension, load them into numpy and read the full matrices with the given pixel values (BQA is 16bits, others are 8bits). To have the full optical image you need to overlap the Blue, Green and Red bands. The easiest way of visualizing this array in terms of a picture is using QGIS or by creating a multi-band image using GDAL command-line.

Every .TIF file is in reality a Geo-TIFF, every band-file is georeferenced, in other words you can load it and know the geographical position of every pixel in the image, given its Coordinates Reference System. Public datasets usually provides a resolution of 30-60-90 meters per pixel; modern small satellites Low-Orbit commercial constellations can reach a resolution of 5-3 and even 1 meter per pixel (some providers state that can provide up to 30 cm per pixel, but this level may collide with the relative imprecision of defining the right global position of a feature).

For an overview about GeoTIFFs format and how it works you can read A Handy Introduction to Cloud Optimized GeoTIFFs

You can now probably understand a little better this notebook full of great examples. Another cool data fusion example and paper.

Tranforming: ML

Any signal that can be acquired from pixel or pixels-neibourhood level can be translated into counters using Computer Vision techniques (CNN, ...) and translated into metrics. Example of georeferenced metrics per tile:

  • number of trees
  • number of boats
  • quality of roads

Load: Data science

Storage of this geo-referenced transformed data can be accomplished with SQL/No-SQL datastores or datawarehouses.

Metrics have to be crossed with other sources of data for the same area and insights have to be produced.

All the data is served using semantically linked APIs.

Deliverables

Accurate risk and/or quality predictions/assessments through large amounts of optical data and relative data integration have to be produced and delivered to the customers using proper channels as a Web UI, but also Jupyter Notebooks or highligthed pictures slideshows with plotted diagrams.