Skip to content

Commit a083a0f

Browse files
authored
add workflow for copying STAC metadata from another STAC API into eoapi (#38)
1 parent aa9df50 commit a083a0f

File tree

5 files changed

+268
-0
lines changed

5 files changed

+268
-0
lines changed

README.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,3 +143,49 @@ Then, deploy
143143
```
144144
uv run npx cdk deploy --all --require-approval never
145145
```
146+
147+
## Loading new collections/items
148+
149+
### Copy collection from another STAC API
150+
151+
One workflow for adding collections and items is to clone the metadata from another STAC API.
152+
You can do this with the following workflow:
153+
154+
1. Set pgstac database credentials
155+
156+
- for docker network:
157+
158+
```bash
159+
export PGUSER=username
160+
export PGPASSWORD=password
161+
export PGHOST=localhost
162+
export PGPORT=5439
163+
export PGDATABASE=postgis
164+
```
165+
166+
- for AWS deployment
167+
168+
```bash
169+
AWS_REGION=us-west-2 source scripts/get-pgstac-creds.sh $EOAPI_PGSTAC_SECRET_ARN
170+
```
171+
172+
> [!NOTE]
173+
> this will load the secret from AWS and set the Postgres environment variables
174+
175+
2. Run the `load` script pointed at an external API and collection:
176+
177+
```bash
178+
uv sync --group load
179+
uv run scripts/load --stac-api https://stac.earthgenome.org --collection-id sentinel2-temporal-mosaics
180+
```
181+
182+
This can be helpful for loading collections and items into your local docker network!
183+
184+
```bash
185+
export PGUSER=username
186+
export PGPASSWORD=password
187+
export PGHOST=localhost
188+
export PGPORT=5439
189+
export PGDATABASE=postgis
190+
uv run scripts/load --stac-api https://stac.eoapi.dev --collection-id MAXAR_Maui_Hawaii_fires_Aug_23 --force
191+
```

pyproject.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,7 @@ dev = [
2222
"pypgstac==0.9.3",
2323
"pytest>=8.3.4",
2424
]
25+
load = [
26+
"pgstacrs>=0.1.1",
27+
"stacrs>=0.5.9",
28+
]

scripts/get-pgstac-creds.sh

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
#!/bin/bash
2+
3+
if [ -z "$1" ]; then
4+
echo "Error: Secret ARN is required."
5+
echo "Usage: $0 <secret-arn>"
6+
echo "Example: $0 arn:aws:secretsmanager:region:account-id:secret:secret-name"
7+
exit 1
8+
fi
9+
10+
SECRET_ARN="$1"
11+
12+
set -e
13+
14+
echo "Retrieving secret with ARN: $SECRET_ARN"
15+
16+
SECRET_VALUE=$(aws secretsmanager get-secret-value \
17+
--secret-id "$SECRET_ARN" \
18+
--query "SecretString" \
19+
--output text)
20+
21+
22+
export PGHOST=$(echo "$SECRET_VALUE" | jq -r '.host')
23+
export PGPORT=$(echo "$SECRET_VALUE" | jq -r '.port')
24+
export PGDATABASE=$(echo "$SECRET_VALUE" | jq -r '.dbname')
25+
export PGUSER=$(echo "$SECRET_VALUE" | jq -r '.username')
26+
export PGPASSWORD=$(echo "$SECRET_VALUE" | jq -r '.password')
27+
28+
export DATABASE_URL="postgresql://$PGUSER:$PGPASSWORD@$PGHOST:$PGPORT/$PGDATABASE"
29+
30+
echo "Environment variables set:"
31+
echo "PGHOST=$PGHOST"
32+
echo "PGPORT=$PGPORT"
33+
echo "PGDATABASE=$PGDATABASE"
34+
echo "PGUSER=$PGUSER"
35+
echo "PGPASSWORD=********"
36+
echo "DATABASE_URL=postgresql://$PGUSER:********@$PGHOST:$PGPORT/$PGDATABASE"
37+
38+
echo "Database credentials have been set as environment variables."

scripts/load

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
#!/usr/bin/env python
2+
"""Fetch collection and item metadata from a STAC API and load it into your pgstac database
3+
4+
Requires that you have the standard Postgres environment variables set:
5+
PGUSER, PGPASSWORD, PGHOST, PGPORT, PGDATABASE
6+
7+
Example:
8+
uv run scripts/load \
9+
--stac-api https://stac.earthgenome.org \
10+
--collection-id sentinel2-temporal-mosaics
11+
"""
12+
13+
import argparse
14+
import asyncio
15+
import os
16+
17+
import httpx
18+
import stacrs
19+
from pgstacrs import Client as PgstacClient
20+
21+
ITEM_BATCH_SIZE = 1000
22+
23+
# optionally specify render parameters for a given collection
24+
RENDERS = {
25+
"sentinel2-temporal-mosaics": {
26+
year: {
27+
"datetime": f"{year}-01-01T00:00:01Z/{year}-12-31T23:59:59Z",
28+
"assets": ["B04", "B03", "B02"],
29+
"rescale": [[0, 2000]],
30+
}
31+
for year in ["2019", "2020", "2021", "2022", "2023", "2024"]
32+
}
33+
}
34+
35+
36+
async def load(stac_api: str, collection_id: str, force: bool = False) -> None:
37+
username = os.getenv("PGUSER")
38+
password = os.getenv("PGPASSWORD")
39+
host = os.getenv("PGHOST")
40+
port = os.getenv("PGPORT")
41+
dbname = os.getenv("PGDATABASE")
42+
pgstac_client = await PgstacClient.open(
43+
f"postgresql://{username}:{password}@{host}:{port}/{dbname}"
44+
)
45+
46+
collection_exists = await pgstac_client.get_collection(collection_id)
47+
if collection_exists and force:
48+
print(f"Deleting collection {collection_id}")
49+
await pgstac_client.delete_collection(collection_id)
50+
51+
if await pgstac_client.get_collection(collection_id) is not None:
52+
print(f"{collection_id} already exists, skipping!")
53+
return
54+
55+
print(f"Getting collection {collection_id} from {stac_api}")
56+
collection_request = httpx.get(f"{stac_api}/collections/{collection_id}")
57+
collection_request.raise_for_status()
58+
collection = collection_request.json()
59+
60+
# drop links from existing stac records
61+
_ = collection.pop("links")
62+
63+
# add render extension metadata if defined in RENDERS
64+
if renders := RENDERS.get(collection_id):
65+
extensions = ["https://stac-extensions.github.io/render/v2.0.0/schema.json"]
66+
extensions.extend(
67+
(
68+
e
69+
for e in collection.get("stac_extensions", [])
70+
if not e.startswith("https://stac-extensions.github.io/render")
71+
)
72+
)
73+
collection["stac_extensions"] = extensions
74+
collection["renders"] = renders
75+
76+
print("Getting items")
77+
items = await stacrs.search(
78+
href=stac_api,
79+
collections=collection_id,
80+
limit=100,
81+
)
82+
83+
for item in items:
84+
_ = item.pop("links")
85+
86+
print("Creating collection")
87+
await pgstac_client.create_collection(collection)
88+
89+
print("Creating items")
90+
for i in range(0, len(items), ITEM_BATCH_SIZE):
91+
batch = items[i : i + ITEM_BATCH_SIZE]
92+
print(
93+
f"Processing batch {i // ITEM_BATCH_SIZE + 1}/{(len(items) + ITEM_BATCH_SIZE - 1) // ITEM_BATCH_SIZE} "
94+
f"({len(batch)} items, {i}-{min(i + ITEM_BATCH_SIZE - 1, len(items) - 1)})"
95+
)
96+
await pgstac_client.create_items(batch)
97+
98+
99+
if __name__ == "__main__":
100+
parser = argparse.ArgumentParser(description="Load data to pgstac")
101+
parser.add_argument(
102+
"--stac-api",
103+
help="STAC API URL",
104+
)
105+
parser.add_argument(
106+
"--collection-id",
107+
help="Collection ID to load",
108+
)
109+
parser.add_argument(
110+
"--force", action="store_true", help="Force deletion of existing collection"
111+
)
112+
113+
args = parser.parse_args()
114+
115+
print(f"Using STAC API: {args.stac_api}")
116+
print(f"Using Collection ID: {args.collection_id}")
117+
118+
asyncio.run(load(args.stac_api, args.collection_id, args.force))

uv.lock

Lines changed: 62 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)