diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 351d5ff5..c4b8db11 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -3,7 +3,7 @@ repos: rev: 0.12.0 hooks: - id: nbqa-black - additional_dependencies: [black==21.5b2] + additional_dependencies: [black==21.5b2, click==8.0.4] args: [--nbqa-mutate] - id: nbqa-flake8 additional_dependencies: [flake8==3.9.2] diff --git a/README.md b/README.md index 4a4d64d0..9ebe98f9 100644 --- a/README.md +++ b/README.md @@ -52,6 +52,7 @@ These tutorials introduce a large topic and cover it in detail. * [Local tools](tutorials/local-tools.ipynb) * [Label-Maker](tutorials/label-maker-dask.ipynb) * [LandCoverNet Dataset on Radiant MLHub](tutorials/radiant-mlhub-landcovernet.ipynb) +* [On-Demand Training Data on Planetary Computer](tutorials/radiant-mlhub-on-demand-training-data.ipynb) ## Learn More diff --git a/tutorials/radiant-mlhub-on-demand-training-data.ipynb b/tutorials/radiant-mlhub-on-demand-training-data.ipynb new file mode 100644 index 00000000..67ce529b --- /dev/null +++ b/tutorials/radiant-mlhub-on-demand-training-data.ipynb @@ -0,0 +1,3753 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "a7cf916c-0991-4af6-889f-a1ce86696d46", + "metadata": {}, + "source": [ + "## On Demand Training Data from Radiant MLHub and Planetary Computer\n", + "\n", + "Radiant MLHub Logo" + ] + }, + { + "cell_type": "markdown", + "id": "010b5b89-32c4-4f6b-81fb-f41d782d251f", + "metadata": {}, + "source": [ + "In this tutorial, we will walk through the process of requesting on-demand traning data from the [Planetary Computer Data Catalog](https://planetarycomputer.microsoft.com/catalog) to pair with the [BigEarthNet](https://mlhub.earth/data/bigearthnet_v1) dataset downloaded from Radiant MLHub. This is an important workflow for someone in the geospatial community who wants to train an ML model on a datasource outside of a prepackaged dataset, such as those found on MLHub. They can start with any dataset containing source image and label collections in STAC, obtain a random sample to work with, fetch source images from a different collection or satellite product, and then reproject and crop those images to match the spatial and temporal extent of the original dataset.\n", + "\n", + "**NOTE:** because the workflow documented below uses libraries like `pystac_client` and `stackstac`, the datasets queried need to be organized into STAC Collections." + ] + }, + { + "cell_type": "markdown", + "id": "f130b365-6fff-4d2a-86a9-39085ab13886", + "metadata": {}, + "source": [ + "Let's start by importing the Python libraries we'll use in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9fe7b447-cf8a-4cc7-aaed-4e8407d0f270", + "metadata": {}, + "outputs": [], + "source": [ + "!pip install --upgrade wget # not installed on PC by default" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "7e144460-5549-4ab4-ba98-10a1a7ebd236", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/kendallsmith/opt/anaconda3/envs/mlhub/lib/python3.9/site-packages/geopandas/_compat.py:111: UserWarning: The Shapely GEOS version (3.10.0-CAPI-1.16.0) is incompatible with the GEOS version PyGEOS was compiled with (3.10.1-CAPI-1.16.0). Conversions between both will be slow.\n", + " warnings.warn(\n" + ] + } + ], + "source": [ + "import getpass\n", + "import tempfile\n", + "from pathlib import Path\n", + "import os\n", + "import json\n", + "from glob import glob\n", + "import requests\n", + "from typing import List, Tuple\n", + "from datetime import datetime as dt\n", + "from datetime import timedelta as td\n", + "\n", + "from radiant_mlhub import Collection\n", + "import planetary_computer\n", + "import pystac_client\n", + "from pystac import ItemCollection, Item, Asset\n", + "from dask import delayed, compute, distributed\n", + "\n", + "import numpy as np\n", + "from stackstac import stack\n", + "from geopandas import GeoDataFrame\n", + "import rasterio as rio\n", + "import rioxarray\n", + "from xarray import DataArray\n", + "from shapely.geometry import shape\n", + "from shapely.geometry import Polygon\n", + "from pyproj import CRS" + ] + }, + { + "cell_type": "markdown", + "id": "3a5bb87b-9d9c-4140-bac8-3e95f146c029", + "metadata": {}, + "source": [ + "### Define global variables" + ] + }, + { + "cell_type": "markdown", + "id": "2de2f657-3229-4037-bf9c-541c503cc269", + "metadata": {}, + "source": [ + "In addition to the API key, we will also need to define some other initial global variables to get our workflow started. e.g. a temporary working directory to download and write data to, the STAC API endpoints, names of Collections, and other variables like the RGB bands for those collections. These are pretty flexible depending on your individual needs." + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "id": "f3bf07a1-3a4a-4207-a449-7be766fa7e36", + "metadata": {}, + "outputs": [], + "source": [ + "# Temporary working directory on local machine or PC instance\n", + "TMP_DIR = tempfile.gettempdir()\n", + "\n", + "# API endpoints for MLHub and Planetary Computer catalogs\n", + "MLHUB_API_URL = \"https://api.radiant.earth/mlhub/v1\"\n", + "MSPC_API_URL = \"https://planetarycomputer.microsoft.com/api/stac/v1\"\n", + "\n", + "# Names of Collections that will be queried against using pystac_client\n", + "BIGEARTHNET_SOURCE_COLLECTION = \"bigearthnet_v1_source\" # sentinel-2 source imagery\n", + "BIGEARTHNET_LABEL_COLLECTION = \"bigearthnet_v1_labels\" # geojson classification labels\n", + "PLANETARY_COMPUTER_LANDSAT_8 = \"landsat-8-c2-l2\" # landsat 8 source imagery on PC\n", + "OUTPUT_DIR = \"landsat_8_source\"\n", + "\n", + "# Default variables that will be used in the API queries\n", + "BIGEARTHNET_TIME_RANGE = \"2017-06-01/2018-05-31\" # full date range for BigEarthNet\n", + "LABEL_CRS = CRS(\"EPSG:4326\")\n", + "DATE_BUFFER = 60\n", + "LANDSAT_8_RGB_BANDS = [\"SR_B4\", \"SR_B3\", \"SR_B2\"] # names of RGB bands from BigEarthNet\n", + "BIGEARTHNET_RGB_BANDS = [\"B04\", \"B03\", \"B02\"] # names of RGB bands from PC Landsat 8\n", + "\n", + "# Bounding box for demonstration fetching Items over Luxembourg\n", + "LUXEMBOURG_AOI = [6.06, 49.58, 6.21, 49.66] # aoi around Luxembourg" + ] + }, + { + "cell_type": "markdown", + "id": "5d5e31b3-5cde-4a7b-af43-dc194b06d0a0", + "metadata": {}, + "source": [ + "### Authentication with Radiant MLHub" + ] + }, + { + "cell_type": "markdown", + "id": "5f11e821-b98b-4df1-a26c-826c9bdbec50", + "metadata": {}, + "source": [ + "Programmatic access to the Radiant MLHub API using the `pystac_client` library requires both the API end-point and an API key. You can obtain an API key for free by registering an account on [mlhub.earth](https://mlhub.earth/). This can be found under `Settings & API Key` from the drop-down once logged in." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "f4c9dd60-3abc-464d-af25-4b23c0d2783b", + "metadata": {}, + "outputs": [ + { + "name": "stdin", + "output_type": "stream", + "text": [ + "MLHub API Key: ································································\n" + ] + } + ], + "source": [ + "MLHUB_API_KEY = getpass.getpass(prompt=\"MLHub API Key: \")" + ] + }, + { + "cell_type": "markdown", + "id": "ccad2439-109f-4eef-ac3b-dac65f54e3aa", + "metadata": {}, + "source": [ + "Once you have your API key, you need to update the default profile file in your home directory. You can use the `mlhub configure` command line tool to do this:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "cfd2681d-56ed-440c-a352-b4ecfd125c03", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Overwrite existing API Key (****b0eced) [y/N]: ^C\n", + "Aborted!\n" + ] + } + ], + "source": [ + "!mlhub configure --api-key={MLHUB_API_KEY}" + ] + }, + { + "cell_type": "markdown", + "id": "5c77d029-191e-42a5-8250-bc451b80f247", + "metadata": {}, + "source": [ + "### Configure API connection to Radiant MLHub" + ] + }, + { + "cell_type": "markdown", + "id": "b0480635-eef5-4e3f-847a-5060c409ae4f", + "metadata": {}, + "source": [ + "This makes a connection to the Radiant MLHub Data Catalog using the API endpoint URL, and the API key from your account." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "bf79c301-76df-4158-bf97-3da53552e143", + "metadata": {}, + "outputs": [], + "source": [ + "mlhub_catalog = pystac_client.Client.open(\n", + " url=MLHUB_API_URL, parameters={\"key\": MLHUB_API_KEY}, ignore_conformance=True\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "637734bc-af77-4c81-ab26-98e808de6415", + "metadata": {}, + "source": [ + "### Fetch label items from BigEarthNet over Luxembourg" + ] + }, + { + "cell_type": "markdown", + "id": "e9f19ce4-57fd-43d0-9263-7a5edd106ee8", + "metadata": {}, + "source": [ + "We will now use the `search` function from the API client to get label Items over Luxembourg as a sample use-case." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "fc61a3d4-cfc0-4da5-9717-bf3c8e427100", + "metadata": {}, + "outputs": [], + "source": [ + "origin_label_items = mlhub_catalog.search(\n", + " collections=BIGEARTHNET_LABEL_COLLECTION,\n", + " bbox=LUXEMBOURG_AOI,\n", + " datetime=BIGEARTHNET_TIME_RANGE,\n", + ").get_all_items()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "917c392c-9ae5-43e2-b7e0-71dd9f749cc2", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "178" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(origin_label_items)" + ] + }, + { + "cell_type": "markdown", + "id": "e9d121a9-e54b-4039-9cef-04277962c2ca", + "metadata": {}, + "source": [ + "This is another helper function that simply displays the geometry for labels from an ItemCollection overlayed on a map of the region." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "2f203e31-5b09-4f2e-bf27-9a3ef8e3fc4d", + "metadata": {}, + "outputs": [], + "source": [ + "def explore_search_extent(items: ItemCollection) -> None:\n", + " \"\"\"Extracts geometry from ItemCollection to display polygons on a map.\n", + "\n", + " Args:\n", + " items: ItemCollection of Items retrieved from pystac_client search\n", + "\n", + " Returns:\n", + " GeoDataFrame object with the .explore() method called\n", + " \"\"\"\n", + " item_feature_collection = items.to_dict()\n", + " geom_df = GeoDataFrame.from_features(item_feature_collection).set_crs(4326)\n", + " print(geom_df.bounds)\n", + " return geom_df[[\"geometry\", \"datetime\"]].explore(\n", + " column=\"datetime\", style_kwds={\"fillOpacity\": 0.2}, cmap=\"viridis\"\n", + " )" + ] + }, + { + "cell_type": "markdown", + "id": "ab923855-3c20-4f10-af00-d4175a54fdd4", + "metadata": {}, + "source": [ + "Here are the BigEarthNet chips with their bounding boxes that matched the spatial parameters for the city of Luxembourg and surrounding areas." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "b86bf5d8-8dc8-491d-b3cd-1ea1263774ca", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " minx miny maxx maxy\n", + "0 6.197958 49.579464 6.215240 49.590700\n", + "1 6.198663 49.590240 6.215949 49.601477\n", + "2 6.199368 49.601017 6.216659 49.612254\n", + "3 6.180682 49.569146 6.197958 49.580381\n", + "4 6.181383 49.579923 6.198663 49.591158\n", + ".. ... ... ... ...\n", + "173 6.151709 49.634721 6.169003 49.645951\n", + "174 6.152406 49.645498 6.169703 49.656729\n", + "175 6.153102 49.656275 6.170404 49.667506\n", + "176 6.135808 49.645951 6.153102 49.657180\n", + "177 6.136501 49.656729 6.153800 49.667957\n", + "\n", + "[178 rows x 4 columns]\n" + ] + }, + { + "data": { + "text/html": [ + "
Make this Notebook Trusted to load map: File -> Trust Notebook
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "explore_search_extent(origin_label_items)" + ] + }, + { + "cell_type": "markdown", + "id": "13c6e607-c79b-4678-a483-9bacb0b3b1df", + "metadata": {}, + "source": [ + "### Download the entire label collection for BigEarthNet from Radiant MLHub" + ] + }, + { + "cell_type": "markdown", + "id": "2e131160-50bb-487f-8fcb-96d37ce80167", + "metadata": {}, + "source": [ + "We could certainly use the method above to query label Items directly from our connection to the Radiant MLHub API endpoint. However, on very large collections, such as in the case with BigEarthNet, pagination becomes a bottleneck issue in obtaining and resolving STAC items, as it only returns 100 items at a time. Querying the entire Collection of nearly ~600,000 Items could take hours.\n", + "\n", + "Therefore, downloading the label Collection (which is only 160 MB) directly is preferrable to paginating over the entire Collection using the API." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "93f67824-bf8f-4cce-862e-856d9cfed26d", + "metadata": {}, + "outputs": [], + "source": [ + "label_collection_path = os.path.join(\n", + " TMP_DIR, BIGEARTHNET_LABEL_COLLECTION, \"collection.json\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "21bd7e52-6bc7-4a8d-bf56-060e80f4019b", + "metadata": {}, + "source": [ + "Check if collection folder already exists before downloading 173 mb dataset. Otherwise download and uncompress the `.tar.gz` file to extract the label collection files." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "81f5a8c4-c604-4f8a-8fb9-1d90607206ee", + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a454f851de6b45289f1db2c42a44beb4", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + " 0%| | 0/173.0 [00:00,\n", + " ,\n", + " ,\n", + " ]" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "first_label_item.links" + ] + }, + { + "cell_type": "markdown", + "id": "5d23fb83-400e-487b-940f-fc158b5ae41b", + "metadata": { + "tags": [] + }, + "source": [ + "### Fetch source items for random sample from BigEarthNet" + ] + }, + { + "cell_type": "markdown", + "id": "e419d96f-f8b9-4911-b5d1-4a4077767062", + "metadata": {}, + "source": [ + "If we had the source collection archive downloaded and uncompressed in the same parent directory as the labels collection, we could reference the source Items and images directly. However the BigEarthNet source collection is over 60GB when compressed. Therefore to work around the disk size limitations of a Planetary Computer instance, we can query the same source items from the MLHub API endpoint, the same way we got the labels, but filter to the exact source item using IDs." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "fa334c57-9dbe-495e-b040-980adf1192c4", + "metadata": {}, + "outputs": [], + "source": [ + "def get_source_item_ids(label_item: Item) -> List[str]:\n", + " return [\n", + " link.href.split(\"/\")[-2] for link in label_item.links if link.rel == \"source\"\n", + " ]" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "f4fa6f4b-8463-40a4-8aec-527d868be5e2", + "metadata": {}, + "outputs": [], + "source": [ + "origin_source_items = mlhub_catalog.search(\n", + " collections=[BIGEARTHNET_SOURCE_COLLECTION],\n", + " ids=get_source_item_ids(first_label_item),\n", + ").get_all_items()" + ] + }, + { + "cell_type": "markdown", + "id": "03e775fa-dd60-4623-8d44-3d83beafcb2c", + "metadata": {}, + "source": [ + "This is the number of source items that match the query parameters we sent to the MLHub API using the first label's bounding box and datetime properties." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "2c98a5d6-1a55-466a-b768-4433cea148ca", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(origin_source_items)" + ] + }, + { + "cell_type": "markdown", + "id": "c0ba4eaf-03d8-4e56-92c8-4e98b26ba983", + "metadata": {}, + "source": [ + "Taking a look at some of the properties of the first source Item found:" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "854abe53-e1b7-452a-8aa2-c139aa220348", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "bigearthnet_v1_source_S2A_MSIL2A_20180413T95032_75_25\n", + "2018-04-13 09:50:32+00:00\n", + "[25.018573933032233, 60.14150761429367, 25.040815502323284, 60.15259741371799]\n", + "{'gsd': 30, 'datetime': '2018-04-13T09:50:32Z', 'eo:bands': [{'name': 'B01', 'common_name': 'Coastal Aerosol', 'description': 'Coastal Aerosol'}, {'name': 'B02', 'common_name': 'Blue', 'description': 'Blue'}, {'name': 'B03', 'common_name': 'Green', 'description': 'Green'}, {'name': 'B04', 'common_name': 'Red', 'description': 'Red'}, {'name': 'B05', 'common_name': 'Vegetation Red Edge', 'description': 'Vegetation Red Edge (704.1nm)'}, {'name': 'B06', 'common_name': 'Vegetation Red Edge', 'description': 'Vegetation Red Edge (740.1nm)'}, {'name': 'B07', 'common_name': 'Vegetation Red Edge', 'description': 'Vegetation Red Edge (782.8nm)'}, {'name': 'B08', 'common_name': 'NIR', 'description': 'NIR'}, {'name': 'B8A', 'common_name': 'Narrow NIR', 'description': 'Narrow NIR'}, {'name': 'B09', 'common_name': 'Water Vapour', 'description': 'Water Vapour'}, {'name': 'B11', 'common_name': 'SWIR', 'description': 'SWIR (1613.7nm)'}, {'name': 'B12', 'common_name': 'SWIR', 'description': 'SWIR (2202.4nm)'}], 'platform': 'Sentinel-2', 'instruments': ['MSI'], 'constellation': 'Sentinel-2'}\n" + ] + } + ], + "source": [ + "for source_item in origin_source_items:\n", + " print(source_item.id)\n", + " print(source_item.datetime)\n", + " print(source_item.bbox)\n", + " print(source_item.properties)\n", + " break" + ] + }, + { + "cell_type": "markdown", + "id": "5303475f-d29b-4e84-82ef-d355ef0519de", + "metadata": {}, + "source": [ + "With the properties from this sample source Item, we can observe where the chip is located, the relevant Sentinel-2 bands (assets) and datetime the image was captured." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "01d88ffe-f530-46d6-87d9-4337c1ec0202", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " minx miny maxx maxy\n", + "0 25.018574 60.141508 25.040816 60.152597\n" + ] + }, + { + "data": { + "text/html": [ + "
Make this Notebook Trusted to load map: File -> Trust Notebook
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "explore_search_extent(origin_source_items)" + ] + }, + { + "cell_type": "markdown", + "id": "77109ba6-f0a1-426e-8884-39a5fac2795f", + "metadata": {}, + "source": [ + "This is the location of the source items fetched from the label Items sample." + ] + }, + { + "cell_type": "markdown", + "id": "a5e0cbba-32b8-47ad-9913-e7ba2a939922", + "metadata": { + "tags": [] + }, + "source": [ + "### Fetch Landsat 8 scenes based on source Item bbox and datetime" + ] + }, + { + "cell_type": "markdown", + "id": "4392b23e-eebb-4a53-88d8-f49fbfacfaf1", + "metadata": {}, + "source": [ + "Configure API connection for the microsoft planetary computer stac endpoint" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "1eddda32-d50d-413d-add8-dedaa2f9a067", + "metadata": {}, + "outputs": [], + "source": [ + "def temporal_buffer(item_datetime: str, date_delta: int) -> str:\n", + " \"\"\"Takes a datetime string and returns a buffer around that date\n", + "\n", + " Args:\n", + " item_datetime: string of the datetime property from an Item\n", + " date_delta: integer for days to add before and after a date\n", + "\n", + " Returns:\n", + " a string range representing the full date buffer\n", + " \"\"\"\n", + " delta = td(days=date_delta)\n", + " item_dt = dt.strptime(item_datetime, \"%Y-%m-%dT%H:%M:%SZ\")\n", + "\n", + " dt_start = item_dt - delta\n", + " dt_start_str = dt_start.strftime(\"%Y-%m-%d\")\n", + "\n", + " dt_end = item_dt + delta\n", + " dt_end_str = dt_end.strftime(\"%Y-%m-%d\")\n", + "\n", + " return f\"{dt_start_str}/{dt_end_str}\"" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "f77e4ec0-a1b8-490c-b732-4c10d89b06ea", + "metadata": {}, + "outputs": [], + "source": [ + "def min_cloud_cover_scene(label_geom: Polygon, search_items: ItemCollection) -> Item:\n", + " \"\"\"Finds the Item with minimal cloud cover from an ItemCollection\n", + "\n", + " Args:\n", + " label_geom: Polygon geometry to ensure label completely within scene\n", + " search_items: ItemCollection of the Items found from pystac_client search\n", + "\n", + " Returns:\n", + " Item where label completely contained within, and minimal cloud cover\n", + " \"\"\"\n", + " min_cc = np.inf\n", + " min_cc_item = None\n", + " for item in search_items:\n", + " item_geom = shape(item.geometry)\n", + " item_cc = item.properties[\"eo:cloud_cover\"]\n", + " if item_cc < min_cc and label_geom.within(item_geom):\n", + " min_cc = item_cc\n", + " min_cc_item = item\n", + " return min_cc_item" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "c3b23fa5-d588-42d4-b913-fab7819e7ea3", + "metadata": {}, + "outputs": [], + "source": [ + "def get_landsat_8_match(label_item: Item) -> Tuple[Item, Item]:\n", + " \"\"\"Finds the best Landsat 8 match using source Item datetime and bounding box.\n", + "\n", + " Args:\n", + " label_item: the STAC label Item object\n", + "\n", + " Returns:\n", + " Tuple of the BigEarthNet source Item and the Landsat 8 match Item\n", + " \"\"\"\n", + " # get the matching source Item properties\n", + " source_items = mlhub_catalog.search(\n", + " collections=[BIGEARTHNET_SOURCE_COLLECTION],\n", + " ids=get_source_item_ids(label_item),\n", + " ).get_all_items()\n", + "\n", + " if source_items:\n", + " source_item = source_items[0]\n", + " source_bbox = source_item.bbox\n", + " source_datetime = source_item.properties[\"datetime\"]\n", + "\n", + " # search PC Catalog for L8 Items\n", + " l8_items = mspc_catalog.search(\n", + " collections=PLANETARY_COMPUTER_LANDSAT_8,\n", + " bbox=source_bbox,\n", + " datetime=temporal_buffer(source_datetime, DATE_BUFFER),\n", + " ).get_all_items()\n", + "\n", + " # filter to best L8 Item match\n", + " signed_l8_items = planetary_computer.sign(l8_items)\n", + " best_l8_match = min_cloud_cover_scene(\n", + " shape(source_item.geometry), signed_l8_items\n", + " )\n", + "\n", + " if not best_l8_match:\n", + " print(\n", + " \"No Landsat 8 Item was found on the Planetary \"\n", + " \"Computer matching the query parameters:\"\n", + " )\n", + " print(\n", + " f\"Source Item ID: {source_item.id} \"\n", + " f\"Bbox: {source_bbox}, \"\n", + " f\"Datetime: {source_datetime}\"\n", + " )\n", + " best_l8_match = None\n", + " else:\n", + " print(\n", + " \"No Sentinel-2 source Item was found in the \"\n", + " \"BigEarthNet dataset matching that label item!\"\n", + " )\n", + " source_item = None\n", + " return source_item, best_l8_match" + ] + }, + { + "cell_type": "markdown", + "id": "69ad7ef1-143e-47f0-b6b9-26c73e5cc65d", + "metadata": {}, + "source": [ + "Since it is known that the BigEarthNet dataset from MLHub has a 1-to-1 pairing of source and labels, we can safely assume the first source item is the appropriate match for our label." + ] + }, + { + "cell_type": "markdown", + "id": "ab77ef46-b54a-42e7-a780-bc8e8094ec6f", + "metadata": {}, + "source": [ + "This makes a connection to the Planetary Computer Data Catalog using the API endpoint URL." + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "38280716-e130-4de4-9699-2c0bcbfc056d", + "metadata": {}, + "outputs": [], + "source": [ + "mspc_catalog = pystac_client.Client.open(MSPC_API_URL)" + ] + }, + { + "cell_type": "markdown", + "id": "39f35d02-d2de-4be5-9317-c80214502c88", + "metadata": {}, + "source": [ + "We will now use the API client with the helper function above to fetch the best Landsat 8 match for the sampled label Item. This will find only the scenes where the label is completely within the scene, and there is minimal cloud cover." + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "53d3ab14-86c7-42a8-be96-8156f5e74d64", + "metadata": {}, + "outputs": [], + "source": [ + "source_item, best_l8_match = get_landsat_8_match(first_label_item)" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "53556cf0-375b-4044-9262-d16de707a13d", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "LC08_L2SP_187018_20180510_02_T1\n", + "[24.323999281168234, 58.95366546927769, 28.882648861314085, 61.187374530722316]\n", + "{'datetime': '2018-05-10T09:22:33.464049Z', 'platform': 'landsat-8', 'proj:bbox': [356085.0, 6537585.0, 601215.0, 6785115.0], 'proj:epsg': 32635, 'description': 'Landsat Collection 2 Level-2 Surface Reflectance Product', 'instruments': ['oli', 'tirs'], 'eo:cloud_cover': 0.01, 'view:off_nadir': 0, 'landsat:wrs_row': '018', 'landsat:scene_id': 'LC81870182018130LGN00', 'landsat:wrs_path': '187', 'landsat:wrs_type': '2', 'view:sun_azimuth': 163.43293558, 'view:sun_elevation': 46.70358845, 'landsat:cloud_cover_land': 0.01, 'landsat:processing_level': 'L2SP', 'landsat:collection_number': '02', 'landsat:collection_category': 'T1'}\n" + ] + } + ], + "source": [ + "if best_l8_match:\n", + " print(best_l8_match.id)\n", + " print(best_l8_match.bbox)\n", + " print(best_l8_match.properties)" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "3f3b5d88-7f3c-4e74-8e68-2408ba762b0f", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " minx miny maxx maxy\n", + "0 24.32561 58.956597 28.877931 61.185413\n" + ] + }, + { + "data": { + "text/html": [ + "
Make this Notebook Trusted to load map: File -> Trust Notebook
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "explore_search_extent(ItemCollection([best_l8_match]))" + ] + }, + { + "cell_type": "markdown", + "id": "66c13807-50a3-4a8c-8a89-33002eafabe6", + "metadata": {}, + "source": [ + "If everything worked correctly, the geographic scope of the Landsat 8 scene should encompass a much larger surface area than the Sentinel-2 source and label chips. From here we need to crop the image down and make sure the chips from both products match." + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "258a1d9e-55aa-4781-a0f5-e77ace273240", + "metadata": {}, + "outputs": [], + "source": [ + "def get_redirect_url(asset: Asset) -> str:\n", + " \"\"\"Returns the direct URL to an asset.\n", + "\n", + " Args:\n", + " asset: Asset object from an Item\n", + "\n", + " Returns:\n", + " string response URL direct to Asset\n", + " \"\"\"\n", + " response = requests.get(asset.href, allow_redirects=True)\n", + " if response.status_code == 200:\n", + " return response.url\n", + " return None" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "1e4e977d-eb14-4aa9-85df-adbe84b1732d", + "metadata": {}, + "outputs": [], + "source": [ + "s2_stack = stack(\n", + " items=ItemCollection([source_item]),\n", + " assets=BIGEARTHNET_RGB_BANDS,\n", + " epsg=rio.open(get_redirect_url(source_item.assets[\"B02\"])).crs.to_epsg(),\n", + " resolution=10,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "287f9b13-82a3-4a5d-82f2-9ab5066c6be1", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.DataArray 'stackstac-9b126e799083670f3b31d7b4408e224e' (time: 1, band: 3, y: 128, x: 128)>\n",
+       "dask.array<fetch_raster_window, shape=(1, 3, 128, 128), dtype=float64, chunksize=(1, 1, 128, 128), chunktype=numpy.ndarray>\n",
+       "Coordinates:\n",
+       "  * time                 (time) datetime64[ns] 2018-04-13T09:50:32\n",
+       "    id                   (time) <U53 'bigearthnet_v1_source_S2A_MSIL2A_201804...\n",
+       "  * band                 (band) <U3 'B04' 'B03' 'B02'\n",
+       "  * x                    (x) float64 3.9e+05 3.9e+05 ... 3.912e+05 3.912e+05\n",
+       "  * y                    (y) float64 6.67e+06 6.67e+06 ... 6.669e+06 6.669e+06\n",
+       "    instruments          <U3 'MSI'\n",
+       "    constellation        <U10 'Sentinel-2'\n",
+       "    gsd                  int64 30\n",
+       "    platform             <U10 'Sentinel-2'\n",
+       "    title                (band) <U35 'S2A_MSIL2A_20180413T95032_75_25_B04' .....\n",
+       "    common_name          (band) <U5 'Red' 'Green' 'Blue'\n",
+       "    center_wavelength    object None\n",
+       "    full_width_half_max  object None\n",
+       "    epsg                 int64 32635\n",
+       "Attributes:\n",
+       "    spec:        RasterSpec(epsg=32635, bounds=(389960, 6668780, 391240, 6670...\n",
+       "    crs:         epsg:32635\n",
+       "    transform:   | 10.00, 0.00, 389960.00|\\n| 0.00,-10.00, 6670060.00|\\n| 0.0...\n",
+       "    resolution:  10
" + ], + "text/plain": [ + "\n", + "dask.array\n", + "Coordinates:\n", + " * time (time) datetime64[ns] 2018-04-13T09:50:32\n", + " id (time) " + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "s2_stack[0].plot(col=\"band\")" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "id": "cc2326b7-d1cc-4474-82b6-95f7da05f897", + "metadata": {}, + "outputs": [], + "source": [ + "l8_original = stack(\n", + " items=ItemCollection([best_l8_match]), assets=LANDSAT_8_RGB_BANDS, resolution=10\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 57, + "id": "aa276eef-73ea-4834-a6a1-7cbd205f3d46", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.DataArray 'stackstac-5d0f877107914bcb285ac37e66d7bcc9' (time: 1, band: 3, y: 24754, x: 24514)>\n",
+       "dask.array<fetch_raster_window, shape=(1, 3, 24754, 24514), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>\n",
+       "Coordinates:\n",
+       "  * time                         (time) datetime64[ns] 2018-05-10T09:22:33.46...\n",
+       "    id                           (time) <U31 'LC08_L2SP_187018_20180510_02_T1'\n",
+       "  * band                         (band) <U5 'SR_B4' 'SR_B3' 'SR_B2'\n",
+       "  * x                            (x) float64 3.561e+05 3.561e+05 ... 6.012e+05\n",
+       "  * y                            (y) float64 6.785e+06 6.785e+06 ... 6.538e+06\n",
+       "    landsat:processing_level     <U4 'L2SP'\n",
+       "    landsat:wrs_type             <U1 '2'\n",
+       "    landsat:cloud_cover_land     float64 0.01\n",
+       "    instruments                  object {'oli', 'tirs'}\n",
+       "    landsat:collection_number    <U2 '02'\n",
+       "    landsat:wrs_path             <U3 '187'\n",
+       "    view:sun_elevation           float64 46.7\n",
+       "    eo:cloud_cover               float64 0.01\n",
+       "    platform                     <U9 'landsat-8'\n",
+       "    landsat:collection_category  <U2 'T1'\n",
+       "    landsat:wrs_row              <U3 '018'\n",
+       "    view:off_nadir               int64 0\n",
+       "    landsat:scene_id             <U21 'LC81870182018130LGN00'\n",
+       "    description                  (band) <U56 'Collection 2 Level-2 Red Band (...\n",
+       "    proj:epsg                    int64 32635\n",
+       "    view:sun_azimuth             float64 163.4\n",
+       "    proj:bbox                    object {6537585.0, 6785115.0, 356085.0, 6012...\n",
+       "    proj:shape                   object {8251, 8171}\n",
+       "    title                        (band) <U15 'Red Band (B4)' ... 'Blue Band (...\n",
+       "    proj:transform               object {0.0, -30.0, 356085.0, 6785115.0, 30.0}\n",
+       "    gsd                          float64 30.0\n",
+       "    common_name                  (band) <U5 'red' 'green' 'blue'\n",
+       "    center_wavelength            (band) float64 0.65 0.56 0.48\n",
+       "    full_width_half_max          (band) float64 0.04 0.06 0.06\n",
+       "    epsg                         int64 32635\n",
+       "Attributes:\n",
+       "    spec:        RasterSpec(epsg=32635, bounds=(356080, 6537580, 601220, 6785...\n",
+       "    crs:         epsg:32635\n",
+       "    transform:   | 10.00, 0.00, 356080.00|\\n| 0.00,-10.00, 6785120.00|\\n| 0.0...\n",
+       "    resolution:  10
" + ], + "text/plain": [ + "\n", + "dask.array\n", + "Coordinates:\n", + " * time (time) datetime64[ns] 2018-05-10T09:22:33.46...\n", + " id (time) \n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.DataArray 'stackstac-dc63732d805d775831b0a32ce9a525a3' (time: 1, band: 3, y: 128, x: 128)>\n",
+       "dask.array<fetch_raster_window, shape=(1, 3, 128, 128), dtype=float64, chunksize=(1, 1, 128, 128), chunktype=numpy.ndarray>\n",
+       "Coordinates:\n",
+       "  * time                         (time) datetime64[ns] 2018-05-10T09:22:33.46...\n",
+       "    id                           (time) <U31 'LC08_L2SP_187018_20180510_02_T1'\n",
+       "  * band                         (band) <U5 'SR_B4' 'SR_B3' 'SR_B2'\n",
+       "  * x                            (x) float64 3.9e+05 3.9e+05 ... 3.912e+05\n",
+       "  * y                            (y) float64 6.67e+06 6.67e+06 ... 6.669e+06\n",
+       "    landsat:processing_level     <U4 'L2SP'\n",
+       "    landsat:wrs_type             <U1 '2'\n",
+       "    landsat:cloud_cover_land     float64 0.01\n",
+       "    instruments                  object {'oli', 'tirs'}\n",
+       "    landsat:collection_number    <U2 '02'\n",
+       "    landsat:wrs_path             <U3 '187'\n",
+       "    view:sun_elevation           float64 46.7\n",
+       "    eo:cloud_cover               float64 0.01\n",
+       "    platform                     <U9 'landsat-8'\n",
+       "    landsat:collection_category  <U2 'T1'\n",
+       "    landsat:wrs_row              <U3 '018'\n",
+       "    view:off_nadir               int64 0\n",
+       "    landsat:scene_id             <U21 'LC81870182018130LGN00'\n",
+       "    description                  (band) <U56 'Collection 2 Level-2 Red Band (...\n",
+       "    proj:epsg                    int64 32635\n",
+       "    view:sun_azimuth             float64 163.4\n",
+       "    proj:bbox                    object {6537585.0, 6785115.0, 356085.0, 6012...\n",
+       "    proj:shape                   object {8251, 8171}\n",
+       "    title                        (band) <U15 'Red Band (B4)' ... 'Blue Band (...\n",
+       "    proj:transform               object {0.0, -30.0, 356085.0, 6785115.0, 30.0}\n",
+       "    gsd                          float64 30.0\n",
+       "    common_name                  (band) <U5 'red' 'green' 'blue'\n",
+       "    center_wavelength            (band) float64 0.65 0.56 0.48\n",
+       "    full_width_half_max          (band) float64 0.04 0.06 0.06\n",
+       "    epsg                         int64 32635\n",
+       "Attributes:\n",
+       "    spec:        RasterSpec(epsg=32635, bounds=(389960, 6668780, 391240, 6670...\n",
+       "    crs:         epsg:32635\n",
+       "    transform:   | 10.00, 0.00, 389960.00|\\n| 0.00,-10.00, 6670060.00|\\n| 0.0...\n",
+       "    resolution:  10
" + ], + "text/plain": [ + "\n", + "dask.array\n", + "Coordinates:\n", + " * time (time) datetime64[ns] 2018-05-10T09:22:33.46...\n", + " id (time) " + ] + }, + "execution_count": 39, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "l8_cropped[0].plot(col=\"band\")" + ] + }, + { + "cell_type": "markdown", + "id": "bc46c85b-daa9-44df-9be7-62dfa1234b25", + "metadata": {}, + "source": [ + "Now we have a cropped Landsat 8 chip that spatially and temporally matches our Sentinel-2 source imagery and label sample from the BigEarthNet dataset." + ] + }, + { + "cell_type": "markdown", + "id": "3a70b063-d273-4aee-864b-07e318388890", + "metadata": {}, + "source": [ + "### Launch a Dask gateway cluster for parallel processing" + ] + }, + { + "cell_type": "markdown", + "id": "cb650d80-bf1a-4aad-8f8b-08a612e28aae", + "metadata": {}, + "source": [ + "We will use Dask to optimize our data processing of hundreds of Landsat-8 scenes by parallelizing the workflow with a delayed computation graph. The Dask Client schedules, runs the delayed computations, and gathers the results, while the Dask Gateway provides a secure and centralized way of managing the multiple client clusters. This is especially useful for running Dask on Planetary Computer." + ] + }, + { + "cell_type": "code", + "execution_count": 70, + "id": "29531759-6d19-4010-8401-eb947a32c515", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "
\n", + "
\n", + "

Client

\n", + "

Client-270dde92-d212-11ec-8442-52879e68a5a2

\n", + " \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + "
Connection method: Cluster objectCluster type: distributed.LocalCluster
\n", + " Dashboard: http://127.0.0.1:8787/status\n", + "
\n", + "\n", + " \n", + "
\n", + "

Cluster Info

\n", + "
\n", + "
\n", + "
\n", + "
\n", + "

LocalCluster

\n", + "

9bbc5757

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + "\n", + " \n", + "
\n", + " Dashboard: http://127.0.0.1:8787/status\n", + " \n", + " Workers: 4\n", + "
\n", + " Total threads: 8\n", + " \n", + " Total memory: 16.00 GiB\n", + "
Status: runningUsing processes: True
\n", + "\n", + "
\n", + " \n", + "

Scheduler Info

\n", + "
\n", + "\n", + "
\n", + "
\n", + "
\n", + "
\n", + "

Scheduler

\n", + "

Scheduler-01ba10f6-8fc1-44d9-896e-3e746fe1048b

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
\n", + " Comm: tcp://127.0.0.1:49767\n", + " \n", + " Workers: 4\n", + "
\n", + " Dashboard: http://127.0.0.1:8787/status\n", + " \n", + " Total threads: 8\n", + "
\n", + " Started: Just now\n", + " \n", + " Total memory: 16.00 GiB\n", + "
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "

Workers

\n", + "
\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "

Worker: 0

\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + " \n", + "\n", + " \n", + "\n", + "
\n", + " Comm: tcp://127.0.0.1:49781\n", + " \n", + " Total threads: 2\n", + "
\n", + " Dashboard: http://127.0.0.1:49785/status\n", + " \n", + " Memory: 4.00 GiB\n", + "
\n", + " Nanny: tcp://127.0.0.1:49770\n", + "
\n", + " Local directory: /Users/kendallsmith/radiant/repos/PlanetaryComputerExamples/tutorials/dask-worker-space/worker-0h0k44fc\n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "

Worker: 1

\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + " \n", + "\n", + " \n", + "\n", + "
\n", + " Comm: tcp://127.0.0.1:49778\n", + " \n", + " Total threads: 2\n", + "
\n", + " Dashboard: http://127.0.0.1:49784/status\n", + " \n", + " Memory: 4.00 GiB\n", + "
\n", + " Nanny: tcp://127.0.0.1:49772\n", + "
\n", + " Local directory: /Users/kendallsmith/radiant/repos/PlanetaryComputerExamples/tutorials/dask-worker-space/worker-8evp6yze\n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "

Worker: 2

\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + " \n", + "\n", + " \n", + "\n", + "
\n", + " Comm: tcp://127.0.0.1:49780\n", + " \n", + " Total threads: 2\n", + "
\n", + " Dashboard: http://127.0.0.1:49783/status\n", + " \n", + " Memory: 4.00 GiB\n", + "
\n", + " Nanny: tcp://127.0.0.1:49771\n", + "
\n", + " Local directory: /Users/kendallsmith/radiant/repos/PlanetaryComputerExamples/tutorials/dask-worker-space/worker-4xh8i4d4\n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "

Worker: 3

\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + " \n", + "\n", + " \n", + "\n", + "
\n", + " Comm: tcp://127.0.0.1:49779\n", + " \n", + " Total threads: 2\n", + "
\n", + " Dashboard: http://127.0.0.1:49782/status\n", + " \n", + " Memory: 4.00 GiB\n", + "
\n", + " Nanny: tcp://127.0.0.1:49773\n", + "
\n", + " Local directory: /Users/kendallsmith/radiant/repos/PlanetaryComputerExamples/tutorials/dask-worker-space/worker-9dvpa_0q\n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "\n", + "
\n", + "
\n", + "\n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "\n", + "
\n", + "
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 70, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "client = distributed.Client() # you can configure Dask client parameters here\n", + "client" + ] + }, + { + "cell_type": "code", + "execution_count": 69, + "id": "55d9dc11-d3b8-4edc-a5fd-acb2faba1c18", + "metadata": {}, + "outputs": [], + "source": [ + "# client.close()" + ] + }, + { + "cell_type": "markdown", + "id": "f3b09697-b6ab-4026-bfcb-2f5214b03f5c", + "metadata": {}, + "source": [ + "### Scale the workflow using Dask Delayed" + ] + }, + { + "cell_type": "markdown", + "id": "5368c39f-94a1-41ba-acc0-fd18c8dc1c18", + "metadata": {}, + "source": [ + "These are two helper functions that we will use to encapsulate the process of creating the cropped Landsat 8 chips and write them to disk in parallel using the Dask Delayed decorator." + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "id": "c924acb1-092e-4f86-b73f-5b56ccdebe27", + "metadata": {}, + "outputs": [], + "source": [ + "def create_landsat_8_dataarray(item_path: str) -> DataArray:\n", + " \"\"\"Creates a Landsat 8 chip from BigEarthNet label chip.\n", + "\n", + " Args:\n", + " item_path: string path to the label item on disk\n", + "\n", + " Returns:\n", + " Landsat 8 DataArray that has been cropped to label bbox\n", + " \"\"\"\n", + " # read label Item object\n", + " label_item = Item.from_file(\n", + " os.path.join(TMP_DIR, BIGEARTHNET_LABEL_COLLECTION, item_path)\n", + " )\n", + "\n", + " # fetch the Landsat 8 scene that best matches the label\n", + " s2_source, l8_match = get_landsat_8_match(label_item)\n", + "\n", + " if l8_match:\n", + " # crop L8 match to S2 dims and read image data\n", + " l8_stack = stack(\n", + " items=ItemCollection([l8_match]),\n", + " assets=LANDSAT_8_RGB_BANDS,\n", + " bounds_latlon=s2_source.bbox,\n", + " resolution=10,\n", + " )\n", + "\n", + " return l8_stack\n", + " return None" + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "id": "02b974fe-0c6f-40a3-ad63-2f4753e0236b", + "metadata": {}, + "outputs": [], + "source": [ + "def write_tifs_bands(l8_array: DataArray, l8_item_id: str) -> None:\n", + " \"\"\"Writes to a GeoTiff for each band in Landsat 8 DataArray\n", + "\n", + " Args:\n", + " l8_array: the DataArray object created from the BigEarthNet label item\n", + " \"\"\"\n", + " # write cropped L8 DataArray to a tiff file for each band\n", + " for _band in LANDSAT_8_RGB_BANDS:\n", + " l8_band_img = l8_array.sel(band=_band)\n", + " l8_band_filename = os.path.join(\n", + " TMP_DIR, OUTPUT_DIR, l8_item_id, f\"{l8_item_id}_{_band}.tiff\"\n", + " )\n", + " Path(os.path.split(l8_band_filename)[0]).mkdir(parents=True, exist_ok=True)\n", + " l8_band_img[0].rio.to_raster(l8_band_filename)" + ] + }, + { + "cell_type": "markdown", + "id": "d5a97950-5b08-4e79-ab15-3736697d0584", + "metadata": {}, + "source": [ + "This sets the stage for the Dask Task Scheduler by mapping all label Items to the `create_landsat_8_dataarray` function. Nothing in the task graph will actually be executed until the `.compute()` command is ran." + ] + }, + { + "cell_type": "code", + "execution_count": 72, + "id": "79668303-468f-4f28-9bf0-6121557acc3d", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "CPU times: user 1.1 s, sys: 481 ms, total: 1.58 s\n", + "Wall time: 7.89 s\n" + ] + } + ], + "source": [ + "%%time\n", + "results = []\n", + "for item_path in label_item_sample[0:5]:\n", + " results.append(create_landsat_8_dataarray(item_path))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8608333f-8e49-4595-863f-1217d5d141c9", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 67, + "id": "088ce116-526a-4657-9c44-f51cffb9326e", + "metadata": {}, + "outputs": [], + "source": [ + "task_pool = []\n", + "\n", + "for item_path in label_item_sample:\n", + " delayed_task = delayed(create_landsat_8_dataarray)(item_path)\n", + " task_pool.append(delayed_task)" + ] + }, + { + "cell_type": "markdown", + "id": "92137ec9-748b-4126-9984-11f4f8d6ec26", + "metadata": {}, + "source": [ + "Now we will persist the objects into memory and run the computations to create our DataArrays." + ] + }, + { + "cell_type": "code", + "execution_count": 68, + "id": "aa01785b-b490-456b-a6c7-3d439275773f", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "CPU times: user 48.2 s, sys: 1.72 s, total: 50 s\n", + "Wall time: 49.9 s\n" + ] + } + ], + "source": [ + "%%time\n", + "task_pool_local = compute(*task_pool, sync=True)" + ] + }, + { + "cell_type": "markdown", + "id": "6f95d76c-dc3e-4591-a596-a95e27a3dbde", + "metadata": {}, + "source": [ + "Lastly, we want to write a GeoTIFF to disk for each band of each Landsat 8 DataArray we created." + ] + }, + { + "cell_type": "code", + "execution_count": 64, + "id": "e1436d4d-10ac-4b39-88fb-5150fe9df12b", + "metadata": {}, + "outputs": [ + { + "ename": "CancelledError", + "evalue": "create_landsat_8_dataarray-99097cdb-3ce8-4543-a1de-47ff6bf8cdc5", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mCancelledError\u001b[0m Traceback (most recent call last)", + "File \u001b[0;32m:2\u001b[0m, in \u001b[0;36m\u001b[0;34m\u001b[0m\n", + "File \u001b[0;32m~/opt/anaconda3/envs/mlhub/lib/python3.9/site-packages/dask/base.py:288\u001b[0m, in \u001b[0;36mDaskMethodsMixin.compute\u001b[0;34m(self, **kwargs)\u001b[0m\n\u001b[1;32m 264\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mcompute\u001b[39m(\u001b[38;5;28mself\u001b[39m, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs):\n\u001b[1;32m 265\u001b[0m \u001b[38;5;124;03m\"\"\"Compute this dask collection\u001b[39;00m\n\u001b[1;32m 266\u001b[0m \n\u001b[1;32m 267\u001b[0m \u001b[38;5;124;03m This turns a lazy Dask collection into its in-memory equivalent.\u001b[39;00m\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 286\u001b[0m \u001b[38;5;124;03m dask.base.compute\u001b[39;00m\n\u001b[1;32m 287\u001b[0m \u001b[38;5;124;03m \"\"\"\u001b[39;00m\n\u001b[0;32m--> 288\u001b[0m (result,) \u001b[38;5;241m=\u001b[39m \u001b[43mcompute\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mtraverse\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 289\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m result\n", + "File \u001b[0;32m~/opt/anaconda3/envs/mlhub/lib/python3.9/site-packages/dask/base.py:571\u001b[0m, in \u001b[0;36mcompute\u001b[0;34m(traverse, optimize_graph, scheduler, get, *args, **kwargs)\u001b[0m\n\u001b[1;32m 568\u001b[0m keys\u001b[38;5;241m.\u001b[39mappend(x\u001b[38;5;241m.\u001b[39m__dask_keys__())\n\u001b[1;32m 569\u001b[0m postcomputes\u001b[38;5;241m.\u001b[39mappend(x\u001b[38;5;241m.\u001b[39m__dask_postcompute__())\n\u001b[0;32m--> 571\u001b[0m results \u001b[38;5;241m=\u001b[39m \u001b[43mschedule\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdsk\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mkeys\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 572\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m repack([f(r, \u001b[38;5;241m*\u001b[39ma) \u001b[38;5;28;01mfor\u001b[39;00m r, (f, a) \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mzip\u001b[39m(results, postcomputes)])\n", + "File \u001b[0;32m~/opt/anaconda3/envs/mlhub/lib/python3.9/site-packages/distributed/client.py:2671\u001b[0m, in \u001b[0;36mClient.get\u001b[0;34m(self, dsk, keys, workers, allow_other_workers, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs)\u001b[0m\n\u001b[1;32m 2615\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mget\u001b[39m(\n\u001b[1;32m 2616\u001b[0m \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m 2617\u001b[0m dsk,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 2629\u001b[0m \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs,\n\u001b[1;32m 2630\u001b[0m ):\n\u001b[1;32m 2631\u001b[0m \u001b[38;5;124;03m\"\"\"Compute dask graph\u001b[39;00m\n\u001b[1;32m 2632\u001b[0m \n\u001b[1;32m 2633\u001b[0m \u001b[38;5;124;03m Parameters\u001b[39;00m\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 2669\u001b[0m \u001b[38;5;124;03m Client.compute : Compute asynchronous collections\u001b[39;00m\n\u001b[1;32m 2670\u001b[0m \u001b[38;5;124;03m \"\"\"\u001b[39;00m\n\u001b[0;32m-> 2671\u001b[0m futures \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_graph_to_futures\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 2672\u001b[0m \u001b[43m \u001b[49m\u001b[43mdsk\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 2673\u001b[0m \u001b[43m \u001b[49m\u001b[43mkeys\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mset\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mflatten\u001b[49m\u001b[43m(\u001b[49m\u001b[43m[\u001b[49m\u001b[43mkeys\u001b[49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 2674\u001b[0m \u001b[43m \u001b[49m\u001b[43mworkers\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mworkers\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 2675\u001b[0m \u001b[43m \u001b[49m\u001b[43mallow_other_workers\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mallow_other_workers\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 2676\u001b[0m \u001b[43m \u001b[49m\u001b[43mresources\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mresources\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 2677\u001b[0m \u001b[43m \u001b[49m\u001b[43mfifo_timeout\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mfifo_timeout\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 2678\u001b[0m \u001b[43m \u001b[49m\u001b[43mretries\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mretries\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 2679\u001b[0m \u001b[43m \u001b[49m\u001b[43muser_priority\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mpriority\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 2680\u001b[0m \u001b[43m \u001b[49m\u001b[43mactors\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mactors\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 2681\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 2682\u001b[0m packed \u001b[38;5;241m=\u001b[39m pack_data(keys, futures)\n\u001b[1;32m 2683\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m sync:\n", + "File \u001b[0;32m~/opt/anaconda3/envs/mlhub/lib/python3.9/site-packages/distributed/client.py:2596\u001b[0m, in \u001b[0;36mClient._graph_to_futures\u001b[0;34m(self, dsk, keys, workers, allow_other_workers, priority, user_priority, resources, retries, fifo_timeout, actors)\u001b[0m\n\u001b[1;32m 2594\u001b[0m \u001b[38;5;66;03m# Pack the high level graph before sending it to the scheduler\u001b[39;00m\n\u001b[1;32m 2595\u001b[0m keyset \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mset\u001b[39m(keys)\n\u001b[0;32m-> 2596\u001b[0m dsk \u001b[38;5;241m=\u001b[39m \u001b[43mdsk\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m__dask_distributed_pack__\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mkeyset\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mannotations\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 2598\u001b[0m \u001b[38;5;66;03m# Create futures before sending graph (helps avoid contention)\u001b[39;00m\n\u001b[1;32m 2599\u001b[0m futures \u001b[38;5;241m=\u001b[39m {key: Future(key, \u001b[38;5;28mself\u001b[39m, inform\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mFalse\u001b[39;00m) \u001b[38;5;28;01mfor\u001b[39;00m key \u001b[38;5;129;01min\u001b[39;00m keyset}\n", + "File \u001b[0;32m~/opt/anaconda3/envs/mlhub/lib/python3.9/site-packages/dask/highlevelgraph.py:1076\u001b[0m, in \u001b[0;36mHighLevelGraph.__dask_distributed_pack__\u001b[0;34m(self, client, client_keys, annotations)\u001b[0m\n\u001b[1;32m 1070\u001b[0m layers \u001b[38;5;241m=\u001b[39m []\n\u001b[1;32m 1071\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m layer \u001b[38;5;129;01min\u001b[39;00m (\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mlayers[name] \u001b[38;5;28;01mfor\u001b[39;00m name \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_toposort_layers()):\n\u001b[1;32m 1072\u001b[0m layers\u001b[38;5;241m.\u001b[39mappend(\n\u001b[1;32m 1073\u001b[0m {\n\u001b[1;32m 1074\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m__module__\u001b[39m\u001b[38;5;124m\"\u001b[39m: layer\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__module__\u001b[39m,\n\u001b[1;32m 1075\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m__name__\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;28mtype\u001b[39m(layer)\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__name__\u001b[39m,\n\u001b[0;32m-> 1076\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mstate\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[43mlayer\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m__dask_distributed_pack__\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 1077\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget_all_external_keys\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1078\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mkey_dependencies\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1079\u001b[0m \u001b[43m \u001b[49m\u001b[43mclient\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1080\u001b[0m \u001b[43m \u001b[49m\u001b[43mclient_keys\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1081\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m,\n\u001b[1;32m 1082\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mannotations\u001b[39m\u001b[38;5;124m\"\u001b[39m: layer\u001b[38;5;241m.\u001b[39m__dask_distributed_annotations_pack__(\n\u001b[1;32m 1083\u001b[0m annotations\n\u001b[1;32m 1084\u001b[0m ),\n\u001b[1;32m 1085\u001b[0m }\n\u001b[1;32m 1086\u001b[0m )\n\u001b[1;32m 1087\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m {\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mlayers\u001b[39m\u001b[38;5;124m\"\u001b[39m: layers}\n", + "File \u001b[0;32m~/opt/anaconda3/envs/mlhub/lib/python3.9/site-packages/dask/highlevelgraph.py:401\u001b[0m, in \u001b[0;36mLayer.__dask_distributed_pack__\u001b[0;34m(self, all_hlg_keys, known_key_dependencies, client, client_keys)\u001b[0m\n\u001b[1;32m 397\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[1;32m 398\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mInputs contain futures that were created by another client.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 399\u001b[0m )\n\u001b[1;32m 400\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m stringify(future\u001b[38;5;241m.\u001b[39mkey) \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m client\u001b[38;5;241m.\u001b[39mfutures:\n\u001b[0;32m--> 401\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m CancelledError(stringify(future\u001b[38;5;241m.\u001b[39mkey))\n\u001b[1;32m 403\u001b[0m \u001b[38;5;66;03m# Calculate dependencies without re-calculating already known dependencies\u001b[39;00m\n\u001b[1;32m 404\u001b[0m \u001b[38;5;66;03m# - Start with known dependencies\u001b[39;00m\n\u001b[1;32m 405\u001b[0m dependencies \u001b[38;5;241m=\u001b[39m known_key_dependencies\u001b[38;5;241m.\u001b[39mcopy()\n", + "\u001b[0;31mCancelledError\u001b[0m: create_landsat_8_dataarray-99097cdb-3ce8-4543-a1de-47ff6bf8cdc5" + ] + } + ], + "source": [ + "%%time\n", + "for l8_array in task_pool_local:\n", + " if isinstance(l8_array, DataArray):\n", + " write_tifs_bands(l8_array, l8_array.id.values[0])" + ] + }, + { + "cell_type": "markdown", + "id": "2bde6ce7-9d7a-4114-88ba-56e4a4bea247", + "metadata": {}, + "source": [ + "This confirms that folders with images were written to disk. If there is a discrepancy between the sample size and the output, it's likely that there wasn't always a matching Landsat 8 scene given the geometry and datetime parameters for a particular Sentinel-2 source Item." + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "id": "3c3dd9a8-2a97-4677-a94a-85377572baa2", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['com.apple.ScreenSaver.Engine.legacyScreenSaver',\n", + " 'com.apple.PressAndHold',\n", + " 'com.apple.avconferenced',\n", + " 'com.google.drivefs.finderhelper.findersync',\n", + " 'com.apple.appleaccountd',\n", + " 'com.apple.replayd',\n", + " 'com.apple.ScreenTimeAgent',\n", + " 'com.displaylink.DisplayLinkLoginHelper',\n", + " 'com.apple.photos.ImageConversionService',\n", + " 'com.apple.FaceTime.FTConversationService',\n", + " 'qipc_systemsem_BoseWebUpdater4a43d835f8e2d8ba59d120a7df8bcbb19328cd88',\n", + " 'com.apple.transparencyd',\n", + " 'com.apple.AppSSOAgent',\n", + " 'com.apple.quicklook.QuickLookUIService',\n", + " 'com.apple.siriactionsd',\n", + " 'com.apple.progressd',\n", + " 'com.apple.photolibraryd',\n", + " 'com.apple.trustd',\n", + " 'com.apple.siri.media-indexer',\n", + " 'com.apple.triald',\n", + " '.com.google.Chrome.P6bcfT',\n", + " 'dropbox-electron.Fw4ulbps3nRW',\n", + " 'com.apple.ap.promotedcontentd',\n", + " 'com.apple.MobileSMS.spotlight',\n", + " 'com.apple.AMPDeviceDiscoveryAgent',\n", + " 'BoseUpdater.log',\n", + " 'com.apple.CalendarNotification.CalNCService',\n", + " 'com.apple.akd',\n", + " 'com.apple.ScreenSaver.Engine',\n", + " 'contentlinkingd',\n", + " 'com.apple.proactiveeventtrackerd',\n", + " 'com.apple.Safari.CacheDeleteExtension',\n", + " 'com.apple.SafariLaunchAgent',\n", + " 'com.getdropbox.dropbox.garcon',\n", + " 'com.apple.ScopedBookmarkAgent',\n", + " 'com.apple.TelephonyUtilities',\n", + " 'com.apple.amp.mediasharingd',\n", + " '.LINKS',\n", + " 'AudioComponentRegistrar',\n", + " 'com.apple.inputmethod.EmojiFunctionRowItem',\n", + " 'bigearthnet_v1_labels',\n", + " 'com.apple.mapspushd',\n", + " 'com.apple.QuickLookThumbnailing.extension.ThumbnailExtension-macOS',\n", + " 'com.apple.photoanalysisd',\n", + " 'internal',\n", + " 'com.apple.appstoreagent',\n", + " 'com.apple.notificationcenterui',\n", + " 'com.apple.BiomeAgent',\n", + " 'TemporaryItems',\n", + " 'com.apple.parsec-fbf',\n", + " 'com.apple.useractivityd',\n", + " 'com.google.drivefs.finderhelper',\n", + " 'com.apple.tipsd',\n", + " 'com.apple.fileproviderd',\n", + " 'com.displaylink.DisplayLinkUserAgent',\n", + " 'com.apple.dmd',\n", + " 'com.apple.nsurlsessiond',\n", + " 'recommendations',\n", + " 'com.apple.Music.MusicCacheExtension',\n", + " 'com.apple.ScreenSaver.Computer-Name',\n", + " 'com.apple.MailCacheDelete',\n", + " 'com.apple.remindd',\n", + " 'StatusKitAgent',\n", + " 'C071006B-016E-4501-A734-551A420E24DA',\n", + " 'com.apple.parsecd',\n", + " 'icdd',\n", + " 'com.apple.identityservicesd',\n", + " 'com.apple.sharingd',\n", + " 'com.apple.geod',\n", + " 'com.apple.TV.TVCacheExtension',\n", + " 'com.apple.CalendarAgent',\n", + " 'com.apple.bird',\n", + " 'com.apple.cloudd',\n", + " 'com.apple.ScreenSaver.Engine.legacyScreenSaver.x86-64',\n", + " 'com.apple.cloudkit.upload-request.cache',\n", + " 'com.apple.wifivelocity',\n", + " 'com.apple.routined',\n", + " '.AddressBookLocks',\n", + " 'metrickitd',\n", + " '.CalendarLocks',\n", + " 'com.apple.ScreenSaver.iLife-Slideshow-Extension',\n", + " 'com.tinyspeck.slackmacgap',\n", + " 'homed',\n", + " 'com.apple.sociallayerd',\n", + " 'com.apple.corespeechd',\n", + " 'com.apple.UsageTrackingAgent',\n", + " 'com.apple.quicklook.ThumbnailsAgent',\n", + " 'com.apple.siri-distributed-evaluation',\n", + " 'com.apple.pluginkit',\n", + " 'diagnosticextensionsd',\n", + " 'journeys',\n", + " 'com.apple.donotdisturbd',\n", + " 'com.apple.quicklook.satellite.general',\n", + " 'com.apple.tccd',\n", + " 'itunescloudd',\n", + " 'com.apple.CloudDocsDaemon.container-metadata-extractor',\n", + " 'com.apple.imagent',\n", + " 'qipc_sharedmemory_BoseWebUpdater4a43d835f8e2d8ba59d120a7df8bcbb19328cd88',\n", + " 'com.apple.AirPlayUIAgent',\n", + " '4FD3E732-58B2-4F9E-823B-DA776BAC658B',\n", + " 'com.apple.imdpersistence.IMDPersistenceAgent',\n", + " 'proactived',\n", + " 'com.apple.ap.adprivacyd',\n", + " 'analytics',\n", + " '.keystoneAgentLock',\n", + " 'com.apple.mediaanalysisd',\n", + " 'com.apple.OSDUIHelper',\n", + " 'bigearthnet_v1_labels.tar.gz']" + ] + }, + "execution_count": 59, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "os.listdir(TMP_DIR)" + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "id": "ec64a009-9a87-4fb2-bf5b-fbc41a3f8021", + "metadata": {}, + "outputs": [ + { + "ename": "FileNotFoundError", + "evalue": "[Errno 2] No such file or directory: '/var/folders/87/c2vwc00s3rq1xw26bz5j25rh0000gn/T/landsat_8_source'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)", + "Input \u001b[0;32mIn [62]\u001b[0m, in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m landsat_chip_dir \u001b[38;5;241m=\u001b[39m os\u001b[38;5;241m.\u001b[39mpath\u001b[38;5;241m.\u001b[39mjoin(TMP_DIR, OUTPUT_DIR)\n\u001b[0;32m----> 2\u001b[0m \u001b[38;5;28mlen\u001b[39m(\u001b[43mos\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mlistdir\u001b[49m\u001b[43m(\u001b[49m\u001b[43mlandsat_chip_dir\u001b[49m\u001b[43m)\u001b[49m)\n", + "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: '/var/folders/87/c2vwc00s3rq1xw26bz5j25rh0000gn/T/landsat_8_source'" + ] + } + ], + "source": [ + "landsat_chip_dir = os.path.join(TMP_DIR, OUTPUT_DIR)\n", + "len(os.listdir(landsat_chip_dir))" + ] + }, + { + "cell_type": "markdown", + "id": "e0884796-bfab-4905-aa75-2f8dd43a5a13", + "metadata": {}, + "source": [ + "Open one of the new Landsat 8 chips to inspect what it looks like." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "beb3d5a8-2677-43dc-867a-153eb1e087f9", + "metadata": {}, + "outputs": [], + "source": [ + "landsat_images = glob(f\"{landsat_chip_dir}/**/*.tiff\", recursive=True)\n", + "first_l8_img = rioxarray.open_rasterio(landsat_images[0])\n", + "first_l8_img.plot()" + ] + }, + { + "cell_type": "markdown", + "id": "37581811-0f21-4838-9541-db84688032f6", + "metadata": {}, + "source": [ + "Shutdown the Dask client to cleanup cluster resources." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1d643476-917b-484b-a041-8f3c94d12c06", + "metadata": {}, + "outputs": [], + "source": [ + "client.shutdown()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e80c402a-81e6-46fb-bbb5-2c5fc79c8884", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}