Skip to content

Commit

Permalink
Merge pull request #6 from nasa/implement-pre-commit
Browse files Browse the repository at this point in the history
owenlittlejohns authored Apr 6, 2024
2 parents ce99ba1 + 8645d1d commit e468c0a
Showing 40 changed files with 6,086 additions and 4,292 deletions.
5 changes: 5 additions & 0 deletions .git-blame-ignore-revs
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# For more information, see:
# https://docs.github.com/en/repositories/working-with-files/using-files/viewing-a-file#ignore-commits-in-the-blame-view

# Black code formatting of entire repository
56dd43f69d901abbba6cfb765a98dee26ff71cfc
2 changes: 1 addition & 1 deletion .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -11,6 +11,6 @@ A short description of the changes in this PR.
## PR Acceptance Checklist
* [ ] Jira ticket acceptance criteria met.
* [ ] `CHANGELOG.md` updated to include high level summary of PR changes.
* [ ] `VERSION` updated if publishing a release.
* [ ] `docker/service_version.txt` updated if publishing a release.
* [ ] Tests added/updated and passing.
* [ ] Documentation updated (if needed).
20 changes: 20 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.2.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-json
- id: check-yaml
- id: check-added-large-files
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.3.4
hooks:
- id: ruff
args: ["--fix", "--show-fixes"]
- repo: https://github.com/psf/black-pre-commit-mirror
rev: 24.3.0
hooks:
- id: black-jupyter
args: ["--skip-string-normalization"]
language_version: python3.11
18 changes: 12 additions & 6 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,19 @@
## v1.0.4
### 2024-04-05

This version of HOSS implements `black` code formatting across the repository.
There should be no functional changes in the service.

## v1.0.3
### 2024-3-29
### 2024-03-29

This version of HOSS handles the error in the crs_wkt attribute in ATL19 where the
north polar crs variable has a leading iquotation mark escaped by back slash in the
crs_wkt attribute. This causes errors when the projection is being interpreted from
the crs variable attributes.
This version of HOSS handles the error in the crs_wkt attribute in ATL19 where the
north polar crs variable has a leading iquotation mark escaped by back slash in the
crs_wkt attribute. This causes errors when the projection is being interpreted from
the crs variable attributes.

## v1.0.2
### 2024-2-26
### 2024-02-26

This version of HOSS correctly handles edge-aligned geographic collections by
adding the attribute `cell_alignment` with the value `edge` to `hoss_config.json`
33 changes: 33 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -240,6 +240,39 @@ newest release of the code (starting at the top of the file).
## vX.Y.Z
```

### pre-commit hooks:

This repository uses [pre-commit](https://pre-commit.com/) to enable pre-commit
checking the repository for some coding standard best practices. These include:

* Removing trailing whitespaces.
* Removing blank lines at the end of a file.
* JSON files have valid formats.
* [ruff](https://github.com/astral-sh/ruff) Python linting checks.
* [black](https://black.readthedocs.io/en/stable/index.html) Python code
formatting checks.

To enable these checks:

```bash
# Install pre-commit Python package as part of test requirements:
pip install -r tests/pip_test_requirements.txt

# Install the git hook scripts:
pre-commit install

# (Optional) Run against all files:
pre-commit run --all-files
```

When you try to make a new commit locally, `pre-commit` will automatically run.
If any of the hooks detect non-compliance (e.g., trailing whitespace), that
hook will state it failed, and also try to fix the issue. You will need to
review and `git add` the changes before you can make a commit.

It is planned to implement additional hooks, possibly including tools such as
`mypy`.

## Get in touch:

You can reach out to the maintainers of this repository via email:
2 changes: 1 addition & 1 deletion docker/service_version.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.0.3
1.0.4
2 changes: 1 addition & 1 deletion docker/tests.Dockerfile
Original file line number Diff line number Diff line change
@@ -16,7 +16,7 @@ ENV PYTHONDONTWRITEBYTECODE=1
COPY tests/pip_test_requirements.txt .
RUN conda run --name hoss pip install --no-input -r pip_test_requirements.txt

# Copy test directory containing Python unittest suite, test data and utilities
# Copy test directory containing Python unittest suite, test data and utilities
COPY ./tests tests

# Set conda environment to hoss, as conda run will not stream logging.
60 changes: 41 additions & 19 deletions docs/HOSS_DAAC_Operator_Documentation.ipynb
Original file line number Diff line number Diff line change
@@ -170,8 +170,10 @@
"metadata": {},
"outputs": [],
"source": [
"temporal_range = {'start': datetime(2020, 1, 1, 0, 0, 0),\n",
" 'stop': datetime(2020, 1, 31, 23, 59, 59)}"
"temporal_range = {\n",
" 'start': datetime(2020, 1, 1, 0, 0, 0),\n",
" 'stop': datetime(2020, 1, 31, 23, 59, 59),\n",
"}"
]
},
{
@@ -273,14 +275,19 @@
"outputs": [],
"source": [
"# Define the request:\n",
"variable_subset_request = Request(collection=collection, variables=[variable_to_subset], max_results=1)\n",
"variable_subset_request = Request(\n",
" collection=collection, variables=[variable_to_subset], max_results=1\n",
")\n",
"\n",
"# Submit the request and download the results\n",
"variable_subset_job_id = harmony_client.submit(variable_subset_request)\n",
"harmony_client.wait_for_processing(variable_subset_job_id, show_progress=True)\n",
"variable_subset_outputs = [file_future.result()\n",
" for file_future\n",
" in harmony_client.download_all(variable_subset_job_id, overwrite=True)]\n",
"variable_subset_outputs = [\n",
" file_future.result()\n",
" for file_future in harmony_client.download_all(\n",
" variable_subset_job_id, overwrite=True\n",
" )\n",
"]\n",
"\n",
"replace(variable_subset_outputs[0], 'hoss_variable_subset.nc4')\n",
"\n",
@@ -308,15 +315,22 @@
"outputs": [],
"source": [
"# Define the request:\n",
"temporal_subset_request = Request(collection=collection, temporal=temporal_range,\n",
" variables=[variable_to_subset], max_results=1)\n",
"temporal_subset_request = Request(\n",
" collection=collection,\n",
" temporal=temporal_range,\n",
" variables=[variable_to_subset],\n",
" max_results=1,\n",
")\n",
"\n",
"# Submit the request and download the results\n",
"temporal_subset_job_id = harmony_client.submit(temporal_subset_request)\n",
"harmony_client.wait_for_processing(temporal_subset_job_id, show_progress=True)\n",
"temporal_subset_outputs = [file_future.result()\n",
" for file_future\n",
" in harmony_client.download_all(temporal_subset_job_id, overwrite=True)]\n",
"temporal_subset_outputs = [\n",
" file_future.result()\n",
" for file_future in harmony_client.download_all(\n",
" temporal_subset_job_id, overwrite=True\n",
" )\n",
"]\n",
"\n",
"replace(temporal_subset_outputs[0], 'hoss_temporal_subset.nc4')\n",
"\n",
@@ -351,14 +365,17 @@
"outputs": [],
"source": [
"# Define the request:\n",
"bbox_subset_request = Request(collection=collection, spatial=bounding_box, max_results=1)\n",
"bbox_subset_request = Request(\n",
" collection=collection, spatial=bounding_box, max_results=1\n",
")\n",
"\n",
"# Submit the request and download the results\n",
"bbox_subset_job_id = harmony_client.submit(bbox_subset_request)\n",
"harmony_client.wait_for_processing(bbox_subset_job_id, show_progress=True)\n",
"bbox_subset_outputs = [file_future.result()\n",
" for file_future\n",
" in harmony_client.download_all(bbox_subset_job_id, overwrite=True)]\n",
"bbox_subset_outputs = [\n",
" file_future.result()\n",
" for file_future in harmony_client.download_all(bbox_subset_job_id, overwrite=True)\n",
"]\n",
"\n",
"replace(bbox_subset_outputs[0], 'hoss_bbox_subset.nc4')\n",
"\n",
@@ -389,14 +406,19 @@
"outputs": [],
"source": [
"# Define the request:\n",
"shape_file_subset_request = Request(collection=collection, shape='shape_files/bermuda_triangle.geo.json', max_results=1)\n",
"shape_file_subset_request = Request(\n",
" collection=collection, shape='shape_files/bermuda_triangle.geo.json', max_results=1\n",
")\n",
"\n",
"# Submit the request and download the results\n",
"shape_file_subset_job_id = harmony_client.submit(shape_file_subset_request)\n",
"harmony_client.wait_for_processing(shape_file_subset_job_id, show_progress=True)\n",
"shape_file_subset_outputs = [file_future.result()\n",
" for file_future\n",
" in harmony_client.download_all(shape_file_subset_job_id, overwrite=True)]\n",
"shape_file_subset_outputs = [\n",
" file_future.result()\n",
" for file_future in harmony_client.download_all(\n",
" shape_file_subset_job_id, overwrite=True\n",
" )\n",
"]\n",
"\n",
"replace(shape_file_subset_outputs[0], 'hoss_shape_file_subset.nc4')\n",
"# Inspect the results:\n",
90 changes: 63 additions & 27 deletions docs/HOSS_User_Documentation.ipynb
Original file line number Diff line number Diff line change
@@ -127,14 +127,19 @@
"source": [
"variables = ['atmosphere_cloud_liquid_water_content']\n",
"\n",
"variable_subset_request = Request(collection=ghrc_collection, variables=variables, granule_id=[ghrc_granule_id])\n",
"variable_subset_request = Request(\n",
" collection=ghrc_collection, variables=variables, granule_id=[ghrc_granule_id]\n",
")\n",
"variable_subset_job_id = harmony_client.submit(variable_subset_request)\n",
"\n",
"print(f'Processing job: {variable_subset_job_id}')\n",
"\n",
"for filename in [file_future.result()\n",
" for file_future\n",
" in harmony_client.download_all(variable_subset_job_id, overwrite=True, directory=demo_directory)]:\n",
"for filename in [\n",
" file_future.result()\n",
" for file_future in harmony_client.download_all(\n",
" variable_subset_job_id, overwrite=True, directory=demo_directory\n",
" )\n",
"]:\n",
" print(f'Downloaded: {filename}')"
]
},
@@ -157,14 +162,19 @@
"source": [
"gpm_bounding_box = BBox(w=45, s=-45, e=75, n=-15)\n",
"\n",
"bbox_request = Request(collection=gpm_collection, spatial=gpm_bounding_box, granule_id=[gpm_granule_id])\n",
"bbox_request = Request(\n",
" collection=gpm_collection, spatial=gpm_bounding_box, granule_id=[gpm_granule_id]\n",
")\n",
"bbox_job_id = harmony_client.submit(bbox_request)\n",
"\n",
"print(f'Processing job: {bbox_job_id}')\n",
"\n",
"for filename in [file_future.result()\n",
" for file_future\n",
" in harmony_client.download_all(bbox_job_id, overwrite=True, directory=demo_directory)]:\n",
"for filename in [\n",
" file_future.result()\n",
" for file_future in harmony_client.download_all(\n",
" bbox_job_id, overwrite=True, directory=demo_directory\n",
" )\n",
"]:\n",
" print(f'Downloaded: {filename}')"
]
},
@@ -196,15 +206,22 @@
"gpm_bounding_box = BBox(w=45, s=-45, e=75, n=-15)\n",
"gpm_variables = ['/Grid/precipitationCal']\n",
"\n",
"combined_request = Request(collection=gpm_collection, spatial=gpm_bounding_box,\n",
" granule_id=[gpm_granule_id], variables=gpm_variables)\n",
"combined_request = Request(\n",
" collection=gpm_collection,\n",
" spatial=gpm_bounding_box,\n",
" granule_id=[gpm_granule_id],\n",
" variables=gpm_variables,\n",
")\n",
"combined_job_id = harmony_client.submit(combined_request)\n",
"\n",
"print(f'Processing job: {combined_job_id}')\n",
"\n",
"for filename in [file_future.result()\n",
" for file_future\n",
" in harmony_client.download_all(combined_job_id, overwrite=True, directory=demo_directory)]:\n",
"for filename in [\n",
" file_future.result()\n",
" for file_future in harmony_client.download_all(\n",
" combined_job_id, overwrite=True, directory=demo_directory\n",
" )\n",
"]:\n",
" print(f'Downloaded: {filename}')"
]
},
@@ -229,14 +246,19 @@
"source": [
"ghrc_bounding_box = BBox(w=-30, s=-50, e=30, n=0)\n",
"\n",
"edge_request = Request(collection=ghrc_collection, spatial=ghrc_bounding_box, granule_id=[ghrc_granule_id])\n",
"edge_request = Request(\n",
" collection=ghrc_collection, spatial=ghrc_bounding_box, granule_id=[ghrc_granule_id]\n",
")\n",
"edge_job_id = harmony_client.submit(edge_request)\n",
"\n",
"print(f'Processing job: {edge_job_id}')\n",
"\n",
"for filename in [file_future.result()\n",
" for file_future\n",
" in harmony_client.download_all(edge_job_id, overwrite=True, directory=demo_directory)]:\n",
"for filename in [\n",
" file_future.result()\n",
" for file_future in harmony_client.download_all(\n",
" edge_job_id, overwrite=True, directory=demo_directory\n",
" )\n",
"]:\n",
" print(f'Downloaded: {filename}')"
]
},
@@ -268,15 +290,22 @@
"point_in_pixel_box = BBox(w=43.2222, s=-25.1111, e=43.2222, n=-25.1111)\n",
"gpm_variables = ['/Grid/precipitationCal']\n",
"\n",
"point_in_pixel_request = Request(collection=gpm_collection, spatial=point_in_pixel_box,\n",
" granule_id=[gpm_granule_id], variables=gpm_variables)\n",
"point_in_pixel_request = Request(\n",
" collection=gpm_collection,\n",
" spatial=point_in_pixel_box,\n",
" granule_id=[gpm_granule_id],\n",
" variables=gpm_variables,\n",
")\n",
"point_in_pixel_job_id = harmony_client.submit(point_in_pixel_request)\n",
"\n",
"print(f'Processing job: {point_in_pixel_job_id}')\n",
"\n",
"for filename in [file_future.result()\n",
" for file_future\n",
" in harmony_client.download_all(point_in_pixel_job_id, overwrite=True, directory=demo_directory)]:\n",
"for filename in [\n",
" file_future.result()\n",
" for file_future in harmony_client.download_all(\n",
" point_in_pixel_job_id, overwrite=True, directory=demo_directory\n",
" )\n",
"]:\n",
" print(f'Downloaded: {filename}')"
]
},
@@ -298,15 +327,22 @@
"corner_point_box = BBox(w=160, s=20, e=160, n=20)\n",
"gpm_variables = ['/Grid/precipitationCal']\n",
"\n",
"corner_point_request = Request(collection=gpm_collection, spatial=corner_point_box,\n",
" granule_id=[gpm_granule_id], variables=gpm_variables)\n",
"corner_point_request = Request(\n",
" collection=gpm_collection,\n",
" spatial=corner_point_box,\n",
" granule_id=[gpm_granule_id],\n",
" variables=gpm_variables,\n",
")\n",
"corner_point_job_id = harmony_client.submit(corner_point_request)\n",
"\n",
"print(f'Processing job: {corner_point_job_id}')\n",
"\n",
"for filename in [file_future.result()\n",
" for file_future\n",
" in harmony_client.download_all(corner_point_job_id, overwrite=True, directory=demo_directory)]:\n",
"for filename in [\n",
" file_future.result()\n",
" for file_future in harmony_client.download_all(\n",
" corner_point_job_id, overwrite=True, directory=demo_directory\n",
" )\n",
"]:\n",
" print(f'Downloaded: {filename}')"
]
}
2 changes: 1 addition & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#
#
# These requirements are used by the documentation Jupyter notebooks in the
# harmony-opendap-subsetter/docs directory.
#
10 changes: 6 additions & 4 deletions hoss/__main__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
""" Run the Harmony OPeNDAP SubSetter Adapter via the Harmony CLI. """

from argparse import ArgumentParser
from sys import argv

@@ -8,12 +9,13 @@


def main(arguments: list[str]):
""" Parse command line arguments and invoke the appropriate method to
respond to them
"""Parse command line arguments and invoke the appropriate method to
respond to them
"""
parser = ArgumentParser(prog='harmony-opendap-subsetter',
description='Run Harmony OPeNDAP SubSetter.')
parser = ArgumentParser(
prog='harmony-opendap-subsetter', description='Run Harmony OPeNDAP SubSetter.'
)

setup_cli(parser)
harmony_arguments, _ = parser.parse_known_args(arguments[1:])
100 changes: 55 additions & 45 deletions hoss/adapter.py
Original file line number Diff line number Diff line change
@@ -24,6 +24,7 @@
calls to `process_item` for each granule.
"""

import shutil
from tempfile import mkdtemp
from pystac import Asset, Item
@@ -38,11 +39,12 @@


class HossAdapter(BaseHarmonyAdapter):
""" This class extends the BaseHarmonyAdapter class, to implement the
`invoke` method, which performs variable, spatial and temporal
subsetting via requests to OPeNDAP.
"""This class extends the BaseHarmonyAdapter class, to implement the
`invoke` method, which performs variable, spatial and temporal
subsetting via requests to OPeNDAP.
"""

def invoke(self):
"""
Adds validation to default process_item-based invocation
@@ -56,26 +58,26 @@ def invoke(self):
return super().invoke()

def process_item(self, item: Item, source: Source):
""" Processes a single input item. Services that are not aggregating
multiple input files should prefer to implement this method rather
than `invoke`
This example copies its input to the output, marking `variables`
and `subset.bbox` message attributes as having been processed
Parameters
----------
item : pystac.Item
the item that should be processed
source : harmony.message.Source
the input source defining the variables, if any, to subset from
the item
Returns
-------
pystac.Item
a STAC catalog whose metadata and assets describe the service
output
"""Processes a single input item. Services that are not aggregating
multiple input files should prefer to implement this method rather
than `invoke`
This example copies its input to the output, marking `variables`
and `subset.bbox` message attributes as having been processed
Parameters
----------
item : pystac.Item
the item that should be processed
source : harmony.message.Source
the input source defining the variables, if any, to subset from
the item
Returns
-------
pystac.Item
a STAC catalog whose metadata and assets describe the service
output
"""
result = item.clone()
@@ -85,34 +87,44 @@ def process_item(self, item: Item, source: Source):
workdir = mkdtemp()
try:
# Get the data file
asset = next((item_asset for item_asset in item.assets.values()
if 'opendap' in (item_asset.roles or [])), None)
asset = next(
(
item_asset
for item_asset in item.assets.values()
if 'opendap' in (item_asset.roles or [])
),
None,
)

self.logger.info(f'Collection short name: {source.shortName}')

# Invoke service logic to retrieve subset of file from OPeNDAP
output_file_path = subset_granule(asset.href, source, workdir,
self.message, self.logger,
self.config)
output_file_path = subset_granule(
asset.href, source, workdir, self.message, self.logger, self.config
)

# Stage the output file with a conventional filename
mime, _ = get_file_mimetype(output_file_path)
staged_filename = generate_output_filename(
asset.href, variable_subset=source.variables, ext='.nc4',
is_subsetted=(is_index_subset(self.message)
or len(source.variables) > 0)
asset.href,
variable_subset=source.variables,
ext='.nc4',
is_subsetted=(
is_index_subset(self.message) or len(source.variables) > 0
),
)
url = stage(
output_file_path,
staged_filename,
mime,
location=self.message.stagingLocation,
logger=self.logger,
)
url = stage(output_file_path,
staged_filename,
mime,
location=self.message.stagingLocation,
logger=self.logger)

# Update the STAC record
result.assets['data'] = Asset(url,
title=staged_filename,
media_type=mime,
roles=['data'])
result.assets['data'] = Asset(
url, title=staged_filename, media_type=mime, roles=['data']
)

# Return the STAC record
return result
@@ -126,8 +138,8 @@ def process_item(self, item: Item, source: Source):
shutil.rmtree(workdir)

def validate_message(self):
""" Check the service was triggered by a valid message containing
the expected number of granules.
"""Check the service was triggered by a valid message containing
the expected number of granules.
"""
if not hasattr(self, 'message'):
@@ -150,9 +162,7 @@ def validate_message(self):
has_items = False

if not has_granules and not has_items:
raise HarmonyException(
'No granules specified for variable subsetting'
)
raise HarmonyException('No granules specified for variable subsetting')

for source in self.message.sources:
if not hasattr(source, 'variables') or not source.variables:
359 changes: 187 additions & 172 deletions hoss/bbox_utilities.py

Large diffs are not rendered by default.

474 changes: 259 additions & 215 deletions hoss/dimension_utilities.py

Large diffs are not rendered by default.

127 changes: 75 additions & 52 deletions hoss/exceptions.py
Original file line number Diff line number Diff line change
@@ -7,115 +7,138 @@


class CustomError(Exception):
""" Base class for exceptions in HOSS. This base class allows for future
work, such as assigning exit codes for specific failure modes.
"""Base class for exceptions in HOSS. This base class allows for future
work, such as assigning exit codes for specific failure modes.
"""

def __init__(self, exception_type, message):
self.exception_type = exception_type
self.message = message
super().__init__(self.message)


class InvalidInputGeoJSON(CustomError):
""" This exception is raised when a supplied GeoJSON object does not
adhere the GeoJSON schema. For example, if a GeoJSON geometry does not
contain either a `bbox` or a `coordinates` attribute.
"""This exception is raised when a supplied GeoJSON object does not
adhere the GeoJSON schema. For example, if a GeoJSON geometry does not
contain either a `bbox` or a `coordinates` attribute.
"""

def __init__(self):
super().__init__('InvalidInputGeoJSON',
'The supplied shape file cannot be parsed according '
'to the GeoJSON format defined in RFC 7946.')
super().__init__(
'InvalidInputGeoJSON',
'The supplied shape file cannot be parsed according '
'to the GeoJSON format defined in RFC 7946.',
)


class InvalidNamedDimension(CustomError):
""" This exception is raised when a user-supplied dimension name
is not in the list of required dimensions for the subset.
"""This exception is raised when a user-supplied dimension name
is not in the list of required dimensions for the subset.
"""

def __init__(self, dimension_name):
super().__init__('InvalidNamedDimension',
f'"{dimension_name}" is not a dimension for '
'any of the requested variables.')
super().__init__(
'InvalidNamedDimension',
f'"{dimension_name}" is not a dimension for '
'any of the requested variables.',
)


class InvalidRequestedRange(CustomError):
""" This exception is raised when a user-supplied dimension range lies
entirely outside the range of a dimension with an associated bounds
variable.
"""This exception is raised when a user-supplied dimension range lies
entirely outside the range of a dimension with an associated bounds
variable.
"""

def __init__(self):
super().__init__('InvalidRequestedRange',
'Input request specified range outside supported '
'dimension range')
super().__init__(
'InvalidRequestedRange',
'Input request specified range outside supported ' 'dimension range',
)


class MissingGridMappingMetadata(CustomError):
""" This exception is raised when HOSS tries to obtain the `grid_mapping`
metadata attribute for a projected variable and it is not present in
either the input granule or the CF-Convention overrides defined in the
earthdata-varinfo configuration file.
"""This exception is raised when HOSS tries to obtain the `grid_mapping`
metadata attribute for a projected variable and it is not present in
either the input granule or the CF-Convention overrides defined in the
earthdata-varinfo configuration file.
"""

def __init__(self, variable_name):
super().__init__('MissingGridMappingMetadata',
f'Projected variable "{variable_name}" does not have '
'an associated "grid_mapping" metadata attribute.')
super().__init__(
'MissingGridMappingMetadata',
f'Projected variable "{variable_name}" does not have '
'an associated "grid_mapping" metadata attribute.',
)


class MissingGridMappingVariable(CustomError):
""" This exception is raised when HOSS tries to extract attributes from a
`grid_mapping` variable referred to by another variable, but that
`grid_mapping` variable is not present in the `.dmr` for that granule.
"""This exception is raised when HOSS tries to extract attributes from a
`grid_mapping` variable referred to by another variable, but that
`grid_mapping` variable is not present in the `.dmr` for that granule.
"""

def __init__(self, grid_mapping_variable, referring_variable):
super().__init__('MissingGridMappingVariable',
f'Grid mapping variable "{grid_mapping_variable}" '
f'referred to by variable "{referring_variable}" is '
'not present in granule .dmr file.')
super().__init__(
'MissingGridMappingVariable',
f'Grid mapping variable "{grid_mapping_variable}" '
f'referred to by variable "{referring_variable}" is '
'not present in granule .dmr file.',
)


class MissingSpatialSubsetInformation(CustomError):
""" This exception is raised when HOSS reaches a branch of the code that
requires spatial subset information, but neither a bounding box, nor a
shape file is specified.
"""This exception is raised when HOSS reaches a branch of the code that
requires spatial subset information, but neither a bounding box, nor a
shape file is specified.
"""

def __init__(self):
super().__init__('MissingSpatialSubsetInformation',
'Either a bounding box or shape file must be '
'specified when performing spatial subsetting.')
super().__init__(
'MissingSpatialSubsetInformation',
'Either a bounding box or shape file must be '
'specified when performing spatial subsetting.',
)


class UnsupportedShapeFileFormat(CustomError):
""" This exception is raised when the shape file included in the input
Harmony message is not GeoJSON.
"""This exception is raised when the shape file included in the input
Harmony message is not GeoJSON.
"""

def __init__(self, shape_file_mime_type: str):
super().__init__('UnsupportedShapeFileFormat',
f'Shape file format "{shape_file_mime_type}" not '
'supported.')
super().__init__(
'UnsupportedShapeFileFormat',
f'Shape file format "{shape_file_mime_type}" not ' 'supported.',
)


class UnsupportedTemporalUnits(CustomError):
""" This exception is raised when the 'units' metadata attribute contains
a temporal unit that is not supported by HOSS.
"""This exception is raised when the 'units' metadata attribute contains
a temporal unit that is not supported by HOSS.
"""

def __init__(self, units_string):
super().__init__('UnsupportedTemporalUnits',
f'Temporal units "{units_string}" not supported.')
super().__init__(
'UnsupportedTemporalUnits',
f'Temporal units "{units_string}" not supported.',
)


class UrlAccessFailed(CustomError):
""" This exception is raised when an HTTP request for a given URL has a non
500 error, and is therefore not retried.
"""This exception is raised when an HTTP request for a given URL has a non
500 error, and is therefore not retried.
"""

def __init__(self, url, status_code):
super().__init__('UrlAccessFailed',
f'{status_code} error retrieving: {url}')
super().__init__('UrlAccessFailed', f'{status_code} error retrieving: {url}')
480 changes: 260 additions & 220 deletions hoss/projection_utilities.py

Large diffs are not rendered by default.

258 changes: 144 additions & 114 deletions hoss/spatial.py
Original file line number Diff line number Diff line change
@@ -21,49 +21,62 @@
For example: [W, S, E, N] = [-20, -90, 20, 90]
"""

from typing import List, Set

from harmony.message import Message
from netCDF4 import Dataset
from numpy.ma.core import MaskedArray
from varinfo import VarInfoFromDmr

from hoss.bbox_utilities import (BBox, get_harmony_message_bbox,
get_shape_file_geojson, get_geographic_bbox)
from hoss.dimension_utilities import (get_dimension_bounds,
get_dimension_extents,
get_dimension_index_range, IndexRange,
IndexRanges)
from hoss.projection_utilities import (get_projected_x_y_extents,
get_projected_x_y_variables,
get_variable_crs)


def get_spatial_index_ranges(required_variables: Set[str],
varinfo: VarInfoFromDmr, dimensions_path: str,
harmony_message: Message,
shape_file_path: str = None) -> IndexRanges:
""" Return a dictionary containing indices that correspond to the minimum
and maximum extents for all horizontal spatial coordinate variables
that support all end-user requested variables. This includes both
geographic and projected horizontal coordinates:
index_ranges = {'/latitude': (12, 34), '/longitude': (56, 78),
'/x': (20, 42), '/y': (31, 53)}
If geographic dimensions are present and only a shape file has been
specified, a minimally encompassing bounding box will be found in order
to determine the longitude and latitude extents.
For projected grids, coordinate dimensions must be considered in x, y
pairs. The minimum and/or maximum values of geographically defined
shapes in the target projected grid may be midway along an exterior
edge of the shape, rather than a known coordinate vertex. For this
reason, a minimum grid resolution in geographic coordinates will be
determined for each projected coordinate variable pairs. The input
bounding box or shape file will be populated with additional points
around the exterior of the user-defined GeoJSON shape, to ensure the
correct extents are derived.
from hoss.bbox_utilities import (
BBox,
get_harmony_message_bbox,
get_shape_file_geojson,
get_geographic_bbox,
)
from hoss.dimension_utilities import (
get_dimension_bounds,
get_dimension_extents,
get_dimension_index_range,
IndexRange,
IndexRanges,
)
from hoss.projection_utilities import (
get_projected_x_y_extents,
get_projected_x_y_variables,
get_variable_crs,
)


def get_spatial_index_ranges(
required_variables: Set[str],
varinfo: VarInfoFromDmr,
dimensions_path: str,
harmony_message: Message,
shape_file_path: str = None,
) -> IndexRanges:
"""Return a dictionary containing indices that correspond to the minimum
and maximum extents for all horizontal spatial coordinate variables
that support all end-user requested variables. This includes both
geographic and projected horizontal coordinates:
index_ranges = {'/latitude': (12, 34), '/longitude': (56, 78),
'/x': (20, 42), '/y': (31, 53)}
If geographic dimensions are present and only a shape file has been
specified, a minimally encompassing bounding box will be found in order
to determine the longitude and latitude extents.
For projected grids, coordinate dimensions must be considered in x, y
pairs. The minimum and/or maximum values of geographically defined
shapes in the target projected grid may be midway along an exterior
edge of the shape, rather than a known coordinate vertex. For this
reason, a minimum grid resolution in geographic coordinates will be
determined for each projected coordinate variable pairs. The input
bounding box or shape file will be populated with additional points
around the exterior of the user-defined GeoJSON shape, to ensure the
correct extents are derived.
"""
bounding_box = get_harmony_message_bbox(harmony_message)
@@ -72,9 +85,7 @@ def get_spatial_index_ranges(required_variables: Set[str],
geographic_dimensions = varinfo.get_geographic_spatial_dimensions(
required_variables
)
projected_dimensions = varinfo.get_projected_spatial_dimensions(
required_variables
)
projected_dimensions = varinfo.get_projected_spatial_dimensions(required_variables)
non_spatial_variables = required_variables.difference(
varinfo.get_spatial_dimensions(required_variables)
)
@@ -94,89 +105,103 @@ def get_spatial_index_ranges(required_variables: Set[str],

if len(projected_dimensions) > 0:
for non_spatial_variable in non_spatial_variables:
index_ranges.update(get_projected_x_y_index_ranges(
non_spatial_variable, varinfo, dimensions_file,
index_ranges, bounding_box=bounding_box,
shape_file_path=shape_file_path
))
index_ranges.update(
get_projected_x_y_index_ranges(
non_spatial_variable,
varinfo,
dimensions_file,
index_ranges,
bounding_box=bounding_box,
shape_file_path=shape_file_path,
)
)

return index_ranges


def get_projected_x_y_index_ranges(non_spatial_variable: str,
varinfo: VarInfoFromDmr,
dimensions_file: Dataset,
index_ranges: IndexRanges,
bounding_box: BBox = None,
shape_file_path: str = None) -> IndexRanges:
""" This function returns a dictionary containing the minimum and maximum
index ranges for a pair of projection x and y coordinates, e.g.:
index_ranges = {'/x': (20, 42), '/y': (31, 53)}
First, the dimensions of the input, non-spatial variable are checked
for associated projection x and y coordinates. If these are present,
and they have not already been added to the `index_ranges` cache, the
extents of the input spatial subset are determined in these projected
coordinates. This requires the derivation of a minimum resolution of
the target grid in geographic coordinates. Points must be placed along
the exterior of the spatial subset shape. All points are then projected
from a geographic Coordinate Reference System (CRS) to the target grid
CRS. The minimum and maximum values are then derived from these
projected coordinate points.
def get_projected_x_y_index_ranges(
non_spatial_variable: str,
varinfo: VarInfoFromDmr,
dimensions_file: Dataset,
index_ranges: IndexRanges,
bounding_box: BBox = None,
shape_file_path: str = None,
) -> IndexRanges:
"""This function returns a dictionary containing the minimum and maximum
index ranges for a pair of projection x and y coordinates, e.g.:
index_ranges = {'/x': (20, 42), '/y': (31, 53)}
First, the dimensions of the input, non-spatial variable are checked
for associated projection x and y coordinates. If these are present,
and they have not already been added to the `index_ranges` cache, the
extents of the input spatial subset are determined in these projected
coordinates. This requires the derivation of a minimum resolution of
the target grid in geographic coordinates. Points must be placed along
the exterior of the spatial subset shape. All points are then projected
from a geographic Coordinate Reference System (CRS) to the target grid
CRS. The minimum and maximum values are then derived from these
projected coordinate points.
"""
projected_x, projected_y = get_projected_x_y_variables(
varinfo, non_spatial_variable
)

if (
projected_x is not None and projected_y is not None
and not set((projected_x, projected_y)).issubset(
set(index_ranges.keys())
)
projected_x is not None
and projected_y is not None
and not set((projected_x, projected_y)).issubset(set(index_ranges.keys()))
):
crs = get_variable_crs(non_spatial_variable, varinfo)

x_y_extents = get_projected_x_y_extents(
dimensions_file[projected_x][:],
dimensions_file[projected_y][:], crs,
shape_file=shape_file_path, bounding_box=bounding_box
dimensions_file[projected_y][:],
crs,
shape_file=shape_file_path,
bounding_box=bounding_box,
)

x_bounds = get_dimension_bounds(projected_x, varinfo, dimensions_file)
y_bounds = get_dimension_bounds(projected_y, varinfo, dimensions_file)

x_index_ranges = get_dimension_index_range(
dimensions_file[projected_x][:], x_y_extents['x_min'],
x_y_extents['x_max'], bounds_values=x_bounds
dimensions_file[projected_x][:],
x_y_extents['x_min'],
x_y_extents['x_max'],
bounds_values=x_bounds,
)

y_index_ranges = get_dimension_index_range(
dimensions_file[projected_y][:], x_y_extents['y_min'],
x_y_extents['y_max'], bounds_values=y_bounds
dimensions_file[projected_y][:],
x_y_extents['y_min'],
x_y_extents['y_max'],
bounds_values=y_bounds,
)

x_y_index_ranges = {projected_x: x_index_ranges,
projected_y: y_index_ranges}
x_y_index_ranges = {projected_x: x_index_ranges, projected_y: y_index_ranges}
else:
x_y_index_ranges = {}

return x_y_index_ranges


def get_geographic_index_range(dimension: str, varinfo: VarInfoFromDmr,
dimensions_file: Dataset,
bounding_box: BBox) -> IndexRange:
""" Extract the indices that correspond to the minimum and maximum extents
for a specific geographic dimension (longitude or latitude). For
longitudes, it is assumed that the western extent should be considered
the minimum extent. If the bounding box crosses a longitude
discontinuity this will be later identified by the minimum extent index
being larger than the maximum extent index.
def get_geographic_index_range(
dimension: str,
varinfo: VarInfoFromDmr,
dimensions_file: Dataset,
bounding_box: BBox,
) -> IndexRange:
"""Extract the indices that correspond to the minimum and maximum extents
for a specific geographic dimension (longitude or latitude). For
longitudes, it is assumed that the western extent should be considered
the minimum extent. If the bounding box crosses a longitude
discontinuity this will be later identified by the minimum extent index
being larger than the maximum extent index.
The return value from this function is an `IndexRange` tuple of format:
(minimum_index, maximum_index).
The return value from this function is an `IndexRange` tuple of format:
(minimum_index, maximum_index).
"""
variable = varinfo.get_variable(dimension)
@@ -202,44 +227,49 @@ def get_geographic_index_range(dimension: str, varinfo: VarInfoFromDmr,
bounding_box, dimensions_file[dimension][:]
)

return get_dimension_index_range(dimensions_file[dimension][:],
minimum_extent, maximum_extent,
bounds_values=bounds)
return get_dimension_index_range(
dimensions_file[dimension][:],
minimum_extent,
maximum_extent,
bounds_values=bounds,
)


def get_bounding_box_longitudes(bounding_box: BBox,
longitude_array: MaskedArray) -> List[float]:
""" Ensure the bounding box extents are compatible with the range of the
longitude variable. The Harmony bounding box values are expressed in
the range from -180 ≤ longitude (degrees east) ≤ 180, whereas some
collections have grids with discontinuities at the Prime Meridian and
others have sub-pixel wrap-around at the Antimeridian.
def get_bounding_box_longitudes(
bounding_box: BBox, longitude_array: MaskedArray
) -> List[float]:
"""Ensure the bounding box extents are compatible with the range of the
longitude variable. The Harmony bounding box values are expressed in
the range from -180 ≤ longitude (degrees east) ≤ 180, whereas some
collections have grids with discontinuities at the Prime Meridian and
others have sub-pixel wrap-around at the Antimeridian.
"""
min_longitude, max_longitude = get_dimension_extents(longitude_array)

western_box_extent = get_longitude_in_grid(min_longitude, max_longitude,
bounding_box.west)
eastern_box_extent = get_longitude_in_grid(min_longitude, max_longitude,
bounding_box.east)
western_box_extent = get_longitude_in_grid(
min_longitude, max_longitude, bounding_box.west
)
eastern_box_extent = get_longitude_in_grid(
min_longitude, max_longitude, bounding_box.east
)

return [western_box_extent, eastern_box_extent]


def get_longitude_in_grid(grid_min: float, grid_max: float,
longitude: float) -> float:
""" Ensure that a longitude value from the bounding box extents is within
the full longitude range of the grid. If it is not, check the same
value +/- 360 degrees, to see if either of those are present in the
grid. This function returns the value of the three options that lies
within the grid. If none of these values are within the grid, then the
original longitude value is returned.
def get_longitude_in_grid(grid_min: float, grid_max: float, longitude: float) -> float:
"""Ensure that a longitude value from the bounding box extents is within
the full longitude range of the grid. If it is not, check the same
value +/- 360 degrees, to see if either of those are present in the
grid. This function returns the value of the three options that lies
within the grid. If none of these values are within the grid, then the
original longitude value is returned.
This functionality is used for grids where the longitude values are not
-180 ≤ longitude (degrees east) ≤ 180. This includes:
This functionality is used for grids where the longitude values are not
-180 ≤ longitude (degrees east) ≤ 180. This includes:
* RSSMIF16D: 0 ≤ longitude (degrees east) ≤ 360.
* MERRA-2 products: -180.3125 ≤ longitude (degrees east) ≤ 179.6875.
* RSSMIF16D: 0 ≤ longitude (degrees east) ≤ 360.
* MERRA-2 products: -180.3125 ≤ longitude (degrees east) ≤ 179.6875.
"""
decremented_longitude = longitude - 360
324 changes: 192 additions & 132 deletions hoss/subset.py

Large diffs are not rendered by default.

40 changes: 24 additions & 16 deletions hoss/temporal.py
Original file line number Diff line number Diff line change
@@ -7,6 +7,7 @@
be combined with any other index ranges (e.g., spatial).
"""

from datetime import datetime, timedelta, timezone
from typing import List, Set

@@ -15,8 +16,11 @@
from netCDF4 import Dataset
from varinfo import VarInfoFromDmr

from hoss.dimension_utilities import (get_dimension_bounds,
get_dimension_index_range, IndexRanges)
from hoss.dimension_utilities import (
get_dimension_bounds,
get_dimension_index_range,
IndexRanges,
)
from hoss.exceptions import UnsupportedTemporalUnits


@@ -26,16 +30,19 @@
units_second = {'second', 'seconds', 'sec', 'secs', 's'}


def get_temporal_index_ranges(required_variables: Set[str],
varinfo: VarInfoFromDmr, dimensions_path: str,
harmony_message: Message) -> IndexRanges:
""" Iterate through the temporal dimension and extract the indices that
correspond to the minimum and maximum extents in that dimension.
def get_temporal_index_ranges(
required_variables: Set[str],
varinfo: VarInfoFromDmr,
dimensions_path: str,
harmony_message: Message,
) -> IndexRanges:
"""Iterate through the temporal dimension and extract the indices that
correspond to the minimum and maximum extents in that dimension.
The return value from this function is a dictionary that contains the
index ranges for the time dimension, such as:
The return value from this function is a dictionary that contains the
index ranges for the time dimension, such as:
index_range = {'/time': [1, 5]}
index_range = {'/time': [1, 5]}
"""
index_ranges = {}
@@ -58,17 +65,18 @@ def get_temporal_index_ranges(required_variables: Set[str],
maximum_extent = (time_end - time_ref) / time_delta

index_ranges[dimension] = get_dimension_index_range(
dimensions_file[dimension][:], minimum_extent, maximum_extent,
bounds_values=get_dimension_bounds(dimension, varinfo,
dimensions_file)
dimensions_file[dimension][:],
minimum_extent,
maximum_extent,
bounds_values=get_dimension_bounds(dimension, varinfo, dimensions_file),
)

return index_ranges


def get_datetime_with_timezone(timestring: str) -> datetime:
""" function to parse string to datetime, and ensure datetime is timezone
"aware". If a timezone is not supplied, it is assumed to be UTC.
"""function to parse string to datetime, and ensure datetime is timezone
"aware". If a timezone is not supplied, it is assumed to be UTC.
"""

@@ -81,7 +89,7 @@ def get_datetime_with_timezone(timestring: str) -> datetime:


def get_time_ref(units_time: str) -> List[datetime]:
""" Retrieve the reference time (epoch) and time step size. """
"""Retrieve the reference time (epoch) and time step size."""
unit, epoch_str = units_time.split(' since ')
ref_time = get_datetime_with_timezone(epoch_str)

104 changes: 58 additions & 46 deletions hoss/utilities.py
Original file line number Diff line number Diff line change
@@ -3,6 +3,7 @@
allows finer-grained unit testing of each smaller part of functionality.
"""

from logging import Logger
from os import sep
from os.path import splitext
@@ -19,10 +20,10 @@


def get_file_mimetype(file_name: str) -> Tuple[Optional[str], Optional[str]]:
""" This function tries to infer the MIME type of a file string. If
the `mimetypes.guess_type` function cannot guess the MIME type of the
granule, a default value is returned, which assumes that the file is
a NetCDF-4 file.
"""This function tries to infer the MIME type of a file string. If
the `mimetypes.guess_type` function cannot guess the MIME type of the
granule, a default value is returned, which assumes that the file is
a NetCDF-4 file.
"""
mimetype = mimetypes.guess_type(file_name, False)
@@ -33,13 +34,19 @@ def get_file_mimetype(file_name: str) -> Tuple[Optional[str], Optional[str]]:
return mimetype


def get_opendap_nc4(url: str, required_variables: Set[str], output_dir: str,
logger: Logger, access_token: str, config: Config) -> str:
""" Construct a semi-colon separated string of the required variables and
use as a constraint expression to retrieve those variables from
OPeNDAP.
def get_opendap_nc4(
url: str,
required_variables: Set[str],
output_dir: str,
logger: Logger,
access_token: str,
config: Config,
) -> str:
"""Construct a semi-colon separated string of the required variables and
use as a constraint expression to retrieve those variables from
OPeNDAP.
Returns the path of the downloaded granule containing those variables.
Returns the path of the downloaded granule containing those variables.
"""
constraint_expression = get_constraint_expression(required_variables)
@@ -50,31 +57,36 @@ def get_opendap_nc4(url: str, required_variables: Set[str], output_dir: str,
else:
request_data = None

downloaded_nc4 = download_url(netcdf4_url, output_dir, logger,
access_token=access_token, config=config,
data=request_data)
downloaded_nc4 = download_url(
netcdf4_url,
output_dir,
logger,
access_token=access_token,
config=config,
data=request_data,
)

# Rename output file, to ensure repeated data downloads to OPeNDAP will be
# respected by `harmony-service-lib-py`.
return move_downloaded_nc4(output_dir, downloaded_nc4)


def get_constraint_expression(variables: Set[str]) -> str:
""" Take a set of variables and return a URL encoded, semi-colon separated
DAP4 constraint expression to retrieve those variables. Each variable
may or may not specify their index ranges.
"""Take a set of variables and return a URL encoded, semi-colon separated
DAP4 constraint expression to retrieve those variables. Each variable
may or may not specify their index ranges.
"""
return quote(';'.join(variables), safe='')


def move_downloaded_nc4(output_dir: str, downloaded_file: str) -> str:
""" Change the basename of a NetCDF-4 file downloaded from OPeNDAP. The
`harmony-service-lib-py` produces a local filename that is a hex digest
of the requested URL only. If this filename is already present in the
local file system, `harmony-service-lib-py` assumes it does not need to
make another HTTP request, and just returns the constructed file path,
even if a POST request is being made with different parameters.
"""Change the basename of a NetCDF-4 file downloaded from OPeNDAP. The
`harmony-service-lib-py` produces a local filename that is a hex digest
of the requested URL only. If this filename is already present in the
local file system, `harmony-service-lib-py` assumes it does not need to
make another HTTP request, and just returns the constructed file path,
even if a POST request is being made with different parameters.
"""
extension = splitext(downloaded_file)[1] or '.nc4'
@@ -83,19 +95,24 @@ def move_downloaded_nc4(output_dir: str, downloaded_file: str) -> str:
return new_filename


def download_url(url: str, destination: str, logger: Logger,
access_token: str = None, config: Config = None,
data=None) -> str:
""" Use built-in Harmony functionality to download from a URL. This is
expected to be used for obtaining the granule `.dmr`, a prefetch of
only dimensions and bounds variables, and the subsetted granule itself.
def download_url(
url: str,
destination: str,
logger: Logger,
access_token: str = None,
config: Config = None,
data=None,
) -> str:
"""Use built-in Harmony functionality to download from a URL. This is
expected to be used for obtaining the granule `.dmr`, a prefetch of
only dimensions and bounds variables, and the subsetted granule itself.
OPeNDAP can return intermittent 500 errors. Retries will be performed
by inbuilt functionality in the `harmony-service-lib`. The OPeNDAP
errors are captured and re-raised as custom exceptions.
OPeNDAP can return intermittent 500 errors. Retries will be performed
by inbuilt functionality in the `harmony-service-lib`. The OPeNDAP
errors are captured and re-raised as custom exceptions.
The return value is the location in the file-store of the downloaded
content from the URL.
The return value is the location in the file-store of the downloaded
content from the URL.
"""
logger.info(f'Downloading: {url}')
@@ -105,12 +122,7 @@ def download_url(url: str, destination: str, logger: Logger,

try:
response = util_download(
url,
destination,
logger,
access_token=access_token,
data=data,
cfg=config
url, destination, logger, access_token=access_token, data=data, cfg=config
)
except ForbiddenException as harmony_exception:
raise UrlAccessFailed(url, 400) from harmony_exception
@@ -123,25 +135,25 @@ def download_url(url: str, destination: str, logger: Logger,


def format_variable_set_string(variable_set: Set[str]) -> str:
""" Take an input set of variable strings and return a string that does not
contain curly braces, for compatibility with Harmony logging.
"""Take an input set of variable strings and return a string that does not
contain curly braces, for compatibility with Harmony logging.
"""
return ', '.join(variable_set)


def format_dictionary_string(dictionary: Dict) -> str:
""" Take an input dictionary and return a string that does not contain
curly braces (assuming the dictionary is not nested, or doesn't contain
set values).
"""Take an input dictionary and return a string that does not contain
curly braces (assuming the dictionary is not nested, or doesn't contain
set values).
"""
return '\n'.join([f'{key}: {value}' for key, value in dictionary.items()])


def get_value_or_default(value: Optional[float], default: float) -> float:
""" A helper function that will either return the value, if it is supplied,
or a default value if not.
"""A helper function that will either return the value, if it is supplied,
or a default value if not.
"""
return value if value is not None else default
1 change: 1 addition & 0 deletions tests/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
import os

os.environ['ENV'] = os.environ.get('ENV') or 'test'
2 changes: 1 addition & 1 deletion tests/data/ATL16_prefetch.dmr
Original file line number Diff line number Diff line change
@@ -222,4 +222,4 @@
<Attribute name="short_name" type="String">
<Value>ATL16</Value>
</Attribute>
</Dataset>
</Dataset>
2 changes: 1 addition & 1 deletion tests/data/ATL16_prefetch_bnds.dmr
Original file line number Diff line number Diff line change
@@ -217,4 +217,4 @@
<Attribute name="short_name" type="String">
<Value>ATL16</Value>
</Attribute>
</Dataset>
</Dataset>
2 changes: 1 addition & 1 deletion tests/data/ATL16_prefetch_group.dmr
Original file line number Diff line number Diff line change
@@ -216,4 +216,4 @@
<Attribute name="short_name" type="String">
<Value>ATL16</Value>
</Attribute>
</Dataset>
</Dataset>
4 changes: 2 additions & 2 deletions tests/data/GPM_3IMERGHH_example.dmr
Original file line number Diff line number Diff line change
@@ -109,7 +109,7 @@ EndianType=LITTLE_ENDIAN;
</Attribute>
<Attribute name="LongName" type="String">
<Value>Longitude at the center of
0.10 degree grid intervals of longitude
0.10 degree grid intervals of longitude
from -180 to 180.</Value>
</Attribute>
<Attribute name="bounds" type="String">
@@ -157,7 +157,7 @@ EndianType=LITTLE_ENDIAN;
<Value>time</Value>
</Attribute>
<Attribute name="LongName" type="String">
<Value>Representative time of data in
<Value>Representative time of data in
seconds since 1970-01-01 00:00:00 UTC.</Value>
</Attribute>
<Attribute name="bounds" type="String">
2 changes: 1 addition & 1 deletion tests/data/README.md
Original file line number Diff line number Diff line change
@@ -91,4 +91,4 @@
* ATL16_prefetch_bnds.dmr
- An example `.dmr` file that is nearly identical to the `ATL16_prefetch.dmr` file
except for four additional fabricated variables that represented the four
possible cases of combining bounds variable existence and cell alignment.
possible cases of combining bounds variable existence and cell alignment.
2 changes: 1 addition & 1 deletion tests/geojson_examples/multilinestring.geo.json
Original file line number Diff line number Diff line change
@@ -35,4 +35,4 @@
}
}
]
}
}
1 change: 1 addition & 0 deletions tests/pip_test_requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
coverage~=7.2.2
pre-commit~=3.7.0
pycodestyle~=2.10.0
pylint~=2.17.2
unittest-xml-reporting~=3.2.0
2,718 changes: 1,694 additions & 1,024 deletions tests/test_adapter.py

Large diffs are not rendered by default.

25 changes: 14 additions & 11 deletions tests/test_code_format.py
Original file line number Diff line number Diff line change
@@ -5,26 +5,29 @@


class TestCodeFormat(TestCase):
""" This test class should ensure all Harmony service Python code adheres
to standard Python code styling.
"""This test class should ensure all Harmony service Python code adheres
to standard Python code styling.
Ignored errors and warning:
Ignored errors and warning:
* E501: Line length, which defaults to 80 characters. This is a
preferred feature of the code, but not always easily achieved.
* W503: Break before binary operator. Have to ignore one of W503 or
W504 to allow for breaking of some long lines. PEP8 suggests
breaking the line before a binary operatore is more "Pythonic".
* E501: Line length, which defaults to 80 characters. This is a
preferred feature of the code, but not always easily achieved.
* W503: Break before binary operator. Have to ignore one of W503 or
W504 to allow for breaking of some long lines. PEP8 suggests
breaking the line before a binary operatore is more "Pythonic".
* E203, E701: This repository uses black code formatting, which deviates
from PEP8 for these errors.
"""

@classmethod
def setUpClass(cls):
cls.python_files = Path('hoss').rglob('*.py')

def test_pycodestyle_adherence(self):
""" Ensure all code in the `hoss` directory adheres to PEP8
defined standard.
"""Ensure all code in the `hoss` directory adheres to PEP8
defined standard.
"""
style_guide = StyleGuide(ignore=['E501', 'W503'])
style_guide = StyleGuide(ignore=['E501', 'W503', 'E203', 'E701'])
results = style_guide.check_files(self.python_files)
self.assertEqual(results.total_errors, 0, 'Found code style issues.')
1 change: 1 addition & 0 deletions tests/unit/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
import os

os.environ['ENV'] = os.environ.get('ENV') or 'test'
597 changes: 323 additions & 274 deletions tests/unit/test_adapter.py

Large diffs are not rendered by default.

558 changes: 319 additions & 239 deletions tests/unit/test_bbox_utilities.py

Large diffs are not rendered by default.

954 changes: 524 additions & 430 deletions tests/unit/test_dimension_utilities.py

Large diffs are not rendered by default.

636 changes: 384 additions & 252 deletions tests/unit/test_projection_utilities.py

Large diffs are not rendered by default.

391 changes: 214 additions & 177 deletions tests/unit/test_spatial.py

Large diffs are not rendered by default.

1,649 changes: 984 additions & 665 deletions tests/unit/test_subset.py

Large diffs are not rendered by default.

105 changes: 55 additions & 50 deletions tests/unit/test_temporal.py
Original file line number Diff line number Diff line change
@@ -11,18 +11,21 @@
from varinfo import VarInfoFromDmr

from hoss.exceptions import UnsupportedTemporalUnits
from hoss.temporal import (get_datetime_with_timezone,
get_temporal_index_ranges,
get_time_ref)
from hoss.temporal import (
get_datetime_with_timezone,
get_temporal_index_ranges,
get_time_ref,
)


class TestTemporal(TestCase):
""" A class for testing functions in the hoss.spatial module. """
"""A class for testing functions in the hoss.spatial module."""

@classmethod
def setUpClass(cls):
cls.varinfo = VarInfoFromDmr(
'tests/data/M2T1NXSLV_example.dmr',
config_file='tests/data/test_subsetter_config.json'
config_file='tests/data/test_subsetter_config.json',
)
cls.test_dir = 'tests/output'

@@ -33,52 +36,50 @@ def tearDown(self):
rmtree(self.test_dir)

def test_get_temporal_index_ranges(self):
""" Ensure that correct temporal index ranges can be calculated. """
"""Ensure that correct temporal index ranges can be calculated."""
test_file_name = f'{self.test_dir}/test.nc'
harmony_message = Message({
'temporal': {'start': '2021-01-10T01:30:00',
'end': '2021-01-10T05:30:00'}
})
harmony_message = Message(
{'temporal': {'start': '2021-01-10T01:30:00', 'end': '2021-01-10T05:30:00'}}
)

with Dataset(test_file_name, 'w', format='NETCDF4') as test_file:
test_file.createDimension('time', size=24)

test_file.createVariable('time', int,
dimensions=('time', ))
test_file.createVariable('time', int, dimensions=('time',))
test_file['time'][:] = np.linspace(0, 1380, 24)
test_file['time'].setncatts({'units': 'minutes since 2021-01-10 00:30:00'})

with self.subTest('Time dimension, halfway between the whole hours'):
self.assertDictEqual(
get_temporal_index_ranges({'/time'}, self.varinfo,
test_file_name, harmony_message),
{'/time': (1, 5)}
get_temporal_index_ranges(
{'/time'}, self.varinfo, test_file_name, harmony_message
),
{'/time': (1, 5)},
)

@patch('hoss.temporal.get_dimension_index_range')
def test_get_temporal_index_ranges_bounds(self,
mock_get_dimension_index_range):
""" Ensure that bounds are correctly extracted and used as an argument
for the `get_dimension_index_range` utility function if they are
present in the prefetch file.
def test_get_temporal_index_ranges_bounds(self, mock_get_dimension_index_range):
"""Ensure that bounds are correctly extracted and used as an argument
for the `get_dimension_index_range` utility function if they are
present in the prefetch file.
The GPM IMERG prefetch data are for a granule with a temporal range
of 2020-01-01T12:00:00 to 2020-01-01T12:30:00.
The GPM IMERG prefetch data are for a granule with a temporal range
of 2020-01-01T12:00:00 to 2020-01-01T12:30:00.
"""
mock_get_dimension_index_range.return_value = (1, 2)
gpm_varinfo = VarInfoFromDmr('tests/data/GPM_3IMERGHH_example.dmr')
gpm_prefetch_path = 'tests/data/GPM_3IMERGHH_prefetch.nc4'

harmony_message = Message({
'temporal': {'start': '2020-01-01T12:15:00',
'end': '2020-01-01T12:45:00'}
})
harmony_message = Message(
{'temporal': {'start': '2020-01-01T12:15:00', 'end': '2020-01-01T12:45:00'}}
)

self.assertDictEqual(
get_temporal_index_ranges({'/Grid/time'}, gpm_varinfo,
gpm_prefetch_path, harmony_message),
{'/Grid/time': (1, 2)}
get_temporal_index_ranges(
{'/Grid/time'}, gpm_varinfo, gpm_prefetch_path, harmony_message
),
{'/Grid/time': (1, 2)},
)
mock_get_dimension_index_range.assert_called_once_with(
ANY, 1577880900.0, 1577882700, bounds_values=ANY
@@ -87,64 +88,68 @@ def test_get_temporal_index_ranges_bounds(self,
with Dataset(gpm_prefetch_path) as prefetch:
assert_array_equal(
mock_get_dimension_index_range.call_args_list[0][0][0],
prefetch['/Grid/time'][:]
prefetch['/Grid/time'][:],
)
assert_array_equal(
mock_get_dimension_index_range.call_args_list[0][1]['bounds_values'],
prefetch['/Grid/time_bnds'][:]
prefetch['/Grid/time_bnds'][:],
)

def test_get_time_ref(self):
""" Ensure the 'units' attribute tells the correct time_ref and
time_delta
"""Ensure the 'units' attribute tells the correct time_ref and
time_delta
"""
expected_datetime = datetime(2021, 12, 8, 0, 30, tzinfo=timezone.utc)

with self.subTest('units of minutes'):
self.assertEqual(get_time_ref('minutes since 2021-12-08 00:30:00'),
(expected_datetime, timedelta(minutes=1)))
self.assertEqual(
get_time_ref('minutes since 2021-12-08 00:30:00'),
(expected_datetime, timedelta(minutes=1)),
)

with self.subTest('Units of seconds'):
self.assertEqual(get_time_ref('seconds since 2021-12-08 00:30:00'),
(expected_datetime, timedelta(seconds=1)))
self.assertEqual(
get_time_ref('seconds since 2021-12-08 00:30:00'),
(expected_datetime, timedelta(seconds=1)),
)

with self.subTest('Units of hours'):
self.assertEqual(get_time_ref('hours since 2021-12-08 00:30:00'),
(expected_datetime, timedelta(hours=1)))
self.assertEqual(
get_time_ref('hours since 2021-12-08 00:30:00'),
(expected_datetime, timedelta(hours=1)),
)

with self.subTest('Units of days'):
self.assertEqual(get_time_ref('days since 2021-12-08 00:30:00'),
(expected_datetime, timedelta(days=1)))
self.assertEqual(
get_time_ref('days since 2021-12-08 00:30:00'),
(expected_datetime, timedelta(days=1)),
)

with self.subTest('Unrecognised unit'):
with self.assertRaises(UnsupportedTemporalUnits):
get_time_ref('fortnights since 2021-12-08 00:30:00')

def test_get_datetime_with_timezone(self):
""" Ensure the string is parsed to datetime with timezone. """
"""Ensure the string is parsed to datetime with timezone."""
expected_datetime = datetime(2021, 12, 8, 0, 30, tzinfo=timezone.utc)

with self.subTest('with space'):
self.assertEqual(
get_datetime_with_timezone('2021-12-08 00:30:00'),
expected_datetime
get_datetime_with_timezone('2021-12-08 00:30:00'), expected_datetime
)

with self.subTest('no space'):
self.assertEqual(
get_datetime_with_timezone('2021-12-08T00:30:00'),
expected_datetime
get_datetime_with_timezone('2021-12-08T00:30:00'), expected_datetime
)

with self.subTest('no space with trailing Z'):
self.assertEqual(
get_datetime_with_timezone('2021-12-08T00:30:00Z'),
expected_datetime
get_datetime_with_timezone('2021-12-08T00:30:00Z'), expected_datetime
)

with self.subTest('space with trailing Z'):
self.assertEqual(
get_datetime_with_timezone('2021-12-08 00:30:00Z'),
expected_datetime
get_datetime_with_timezone('2021-12-08 00:30:00Z'), expected_datetime
)
180 changes: 105 additions & 75 deletions tests/unit/test_utilities.py
Original file line number Diff line number Diff line change
@@ -6,15 +6,20 @@
from harmony.util import config

from hoss.exceptions import UrlAccessFailed
from hoss.utilities import (download_url, format_dictionary_string,
format_variable_set_string,
get_constraint_expression, get_file_mimetype,
get_opendap_nc4, get_value_or_default,
move_downloaded_nc4)
from hoss.utilities import (
download_url,
format_dictionary_string,
format_variable_set_string,
get_constraint_expression,
get_file_mimetype,
get_opendap_nc4,
get_value_or_default,
move_downloaded_nc4,
)


class TestUtilities(TestCase):
""" A class for testing functions in the hoss.utilities module. """
"""A class for testing functions in the hoss.utilities module."""

@classmethod
def setUpClass(cls):
@@ -24,9 +29,9 @@ def setUpClass(cls):
cls.logger = getLogger('tests')

def test_get_file_mimetype(self):
""" Ensure a mimetype can be retrieved for a valid file path or, if
the mimetype cannot be inferred, that the default output is
returned. This assumes the output is a NetCDF-4 file.
"""Ensure a mimetype can be retrieved for a valid file path or, if
the mimetype cannot be inferred, that the default output is
returned. This assumes the output is a NetCDF-4 file.
"""
with self.subTest('File with MIME type'):
@@ -41,9 +46,9 @@ def test_get_file_mimetype(self):

@patch('hoss.utilities.util_download')
def test_download_url(self, mock_util_download):
""" Ensure that the `harmony.util.download` function is called. If an
error occurs, the caught exception should be re-raised with a
custom exception with a human-readable error message.
"""Ensure that the `harmony.util.download` function is called. If an
error occurs, the caught exception should be re-raised with a
custom exception with a human-readable error message.
"""
output_directory = 'output/dir'
@@ -55,8 +60,9 @@ def test_download_url(self, mock_util_download):

with self.subTest('Successful response, only make one request.'):
mock_util_download.return_value = http_response
response = download_url(test_url, output_directory, self.logger,
access_token, self.config)
response = download_url(
test_url, output_directory, self.logger, access_token, self.config
)

self.assertEqual(response, http_response)
mock_util_download.assert_called_once_with(
@@ -65,14 +71,20 @@ def test_download_url(self, mock_util_download):
self.logger,
access_token=access_token,
data=None,
cfg=self.config
cfg=self.config,
)
mock_util_download.reset_mock()

with self.subTest('A request with data passes the data to Harmony.'):
mock_util_download.return_value = http_response
response = download_url(test_url, output_directory, self.logger,
access_token, self.config, data=test_data)
response = download_url(
test_url,
output_directory,
self.logger,
access_token,
self.config,
data=test_data,
)

self.assertEqual(response, http_response)
mock_util_download.assert_called_once_with(
@@ -81,54 +93,54 @@ def test_download_url(self, mock_util_download):
self.logger,
access_token=access_token,
data=test_data,
cfg=self.config
cfg=self.config,
)
mock_util_download.reset_mock()

with self.subTest('500 error is caught and handled.'):
mock_util_download.side_effect = [self.harmony_500_error,
http_response]
mock_util_download.side_effect = [self.harmony_500_error, http_response]

with self.assertRaises(UrlAccessFailed):
download_url(test_url, output_directory, self.logger,
access_token, self.config)
download_url(
test_url, output_directory, self.logger, access_token, self.config
)

mock_util_download.assert_called_once_with(
test_url,
output_directory,
self.logger,
access_token=access_token,
data=None,
cfg=self.config
cfg=self.config,
)
mock_util_download.reset_mock()

with self.subTest('Non-500 error does not retry, and is re-raised.'):
mock_util_download.side_effect = [self.harmony_auth_error,
http_response]
mock_util_download.side_effect = [self.harmony_auth_error, http_response]

with self.assertRaises(UrlAccessFailed):
download_url(test_url, output_directory, self.logger,
access_token, self.config)
download_url(
test_url, output_directory, self.logger, access_token, self.config
)

mock_util_download.assert_called_once_with(
test_url,
output_directory,
self.logger,
access_token=access_token,
data=None,
cfg=self.config
cfg=self.config,
)
mock_util_download.reset_mock()

@patch('hoss.utilities.move_downloaded_nc4')
@patch('hoss.utilities.util_download')
def test_get_opendap_nc4(self, mock_download, mock_move_download):
""" Ensure a request is sent to OPeNDAP that combines the URL of the
granule with a constraint expression.
"""Ensure a request is sent to OPeNDAP that combines the URL of the
granule with a constraint expression.
Once the request is completed, the output file should be moved to
ensure a second request to the same URL is still performed.
Once the request is completed, the output file should be moved to
ensure a second request to the same URL is still performed.
"""
downloaded_file_name = 'output_file.nc4'
@@ -143,83 +155,99 @@ def test_get_opendap_nc4(self, mock_download, mock_move_download):
expected_data = {'dap4.ce': 'variable'}

with self.subTest('Request with variables includes dap4.ce'):
output_file = get_opendap_nc4(url, required_variables, output_dir,
self.logger, access_token,
self.config)
output_file = get_opendap_nc4(
url,
required_variables,
output_dir,
self.logger,
access_token,
self.config,
)

self.assertEqual(output_file, moved_file_name)
mock_download.assert_called_once_with(
f'{url}.dap.nc4', output_dir, self.logger,
access_token=access_token, data=expected_data, cfg=self.config
f'{url}.dap.nc4',
output_dir,
self.logger,
access_token=access_token,
data=expected_data,
cfg=self.config,
)
mock_move_download.assert_called_once_with(output_dir,
downloaded_file_name)
mock_move_download.assert_called_once_with(output_dir, downloaded_file_name)

mock_download.reset_mock()
mock_move_download.reset_mock()

with self.subTest('Request with no variables omits dap4.ce'):
output_file = get_opendap_nc4(url, {}, output_dir, self.logger,
access_token, self.config)
output_file = get_opendap_nc4(
url, {}, output_dir, self.logger, access_token, self.config
)

self.assertEqual(output_file, moved_file_name)
mock_download.assert_called_once_with(
f'{url}.dap.nc4', output_dir, self.logger,
access_token=access_token, data=None, cfg=self.config
f'{url}.dap.nc4',
output_dir,
self.logger,
access_token=access_token,
data=None,
cfg=self.config,
)
mock_move_download.assert_called_once_with(output_dir,
downloaded_file_name)
mock_move_download.assert_called_once_with(output_dir, downloaded_file_name)

def test_get_constraint_expression(self):
""" Ensure a correctly encoded DAP4 constraint expression is
constructed for the given input.
"""Ensure a correctly encoded DAP4 constraint expression is
constructed for the given input.
URL encoding:
URL encoding:
- %2F = '/'
- %3A = ':'
- %3B = ';'
- %5B = '['
- %5D = ']'
- %2F = '/'
- %3A = ':'
- %3B = ';'
- %5B = '['
- %5D = ']'
Note - with sets, the order can't be guaranteed, so there are two
options for the combined constraint expression.
Note - with sets, the order can't be guaranteed, so there are two
options for the combined constraint expression.
"""
with self.subTest('No index ranges specified'):
self.assertIn(
get_constraint_expression({'/alpha_var', '/blue_var'}),
['%2Falpha_var%3B%2Fblue_var', '%2Fblue_var%3B%2Falpha_var']
['%2Falpha_var%3B%2Fblue_var', '%2Fblue_var%3B%2Falpha_var'],
)

with self.subTest('Variables with index ranges'):
self.assertIn(
get_constraint_expression({'/alpha_var[1:2]', '/blue_var[3:4]'}),
['%2Falpha_var%5B1%3A2%5D%3B%2Fblue_var%5B3%3A4%5D',
'%2Fblue_var%5B3%3A4%5D%3B%2Falpha_var%5B1%3A2%5D']
[
'%2Falpha_var%5B1%3A2%5D%3B%2Fblue_var%5B3%3A4%5D',
'%2Fblue_var%5B3%3A4%5D%3B%2Falpha_var%5B1%3A2%5D',
],
)

@patch('hoss.utilities.move')
@patch('hoss.utilities.uuid4')
def test_move_downloaded_nc4(self, mock_uuid4, mock_move):
""" Ensure a specified file is moved to the specified location. """
"""Ensure a specified file is moved to the specified location."""
mock_uuid4.return_value = Mock(hex='uuid4')
output_dir = '/tmp/path/to'
old_path = '/tmp/path/to/file.nc4'

self.assertEqual(move_downloaded_nc4(output_dir, old_path),
'/tmp/path/to/uuid4.nc4')
self.assertEqual(
move_downloaded_nc4(output_dir, old_path), '/tmp/path/to/uuid4.nc4'
)

mock_move.assert_called_once_with('/tmp/path/to/file.nc4',
'/tmp/path/to/uuid4.nc4')
mock_move.assert_called_once_with(
'/tmp/path/to/file.nc4', '/tmp/path/to/uuid4.nc4'
)

def test_format_variable_set(self):
""" Ensure a set of variable strings is printed out as expected, and
does not contain any curly braces.
"""Ensure a set of variable strings is printed out as expected, and
does not contain any curly braces.
The formatted string is broken up for verification because sets are
unordered, so the exact ordering of the variables within the
formatted string may not be consistent between runs.
The formatted string is broken up for verification because sets are
unordered, so the exact ordering of the variables within the
formatted string may not be consistent between runs.
"""
variable_set = {'/var_one', '/var_two', '/var_three'}
@@ -230,19 +258,21 @@ def test_format_variable_set(self):
self.assertSetEqual(variable_set, set(formatted_string.split(', ')))

def test_format_dictionary_string(self):
""" Ensure a dictionary is formatted to a string without curly braces.
This function assumes only a single level dictionary, without any
sets for values.
"""Ensure a dictionary is formatted to a string without curly braces.
This function assumes only a single level dictionary, without any
sets for values.
"""
input_dictionary = {'key_one': 'value_one', 'key_two': 'value_two'}

self.assertEqual(format_dictionary_string(input_dictionary),
'key_one: value_one\nkey_two: value_two')
self.assertEqual(
format_dictionary_string(input_dictionary),
'key_one: value_one\nkey_two: value_two',
)

def test_get_value_or_default(self):
""" Ensure a value is retrieved if supplied, even if it is 0, or a
default value is returned if not.
"""Ensure a value is retrieved if supplied, even if it is 0, or a
default value is returned if not.
"""
with self.subTest('Value is returned'):
38 changes: 22 additions & 16 deletions tests/utilities.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
""" Utility classes used to extend the unittest capabilities """

from collections import namedtuple
from datetime import datetime
from typing import List
@@ -12,9 +13,9 @@


def write_dmr(output_dir: str, content: str):
""" A helper function to write out the content of a `.dmr`, when the
`harmony.util.download` function is called. This will be called as
a side-effect to the mock for that function.
"""A helper function to write out the content of a `.dmr`, when the
`harmony.util.download` function is called. This will be called as
a side-effect to the mock for that function.
"""
dmr_name = f'{output_dir}/downloaded.dmr'
@@ -59,32 +60,37 @@ def wrapper(self, *args, **kwargs):
raise
return_values.append(result)
return result

wrapper.mock = mock
wrapper.return_values = return_values
wrapper.errors = errors
return wrapper


def create_stac(granules: List[Granule]) -> Catalog:
""" Create a SpatioTemporal Asset Catalog (STAC). These are used as inputs
for Harmony requests, containing the URL and other information for
input granules.
"""Create a SpatioTemporal Asset Catalog (STAC). These are used as inputs
for Harmony requests, containing the URL and other information for
input granules.
For simplicity the geometry and temporal properties of each item are
set to default values, as only the URL, media type and role are used by
HOSS.
For simplicity the geometry and temporal properties of each item are
set to default values, as only the URL, media type and role are used by
HOSS.
"""
catalog = Catalog(id='input', description='test input')

for granule_index, granule in enumerate(granules):
item = Item(id=f'granule_{granule_index}',
geometry=bbox_to_geometry([-180, -90, 180, 90]),
bbox=[-180, -90, 180, 90],
datetime=datetime(2020, 1, 1), properties=None)
item.add_asset('input_data',
Asset(granule.url, media_type=granule.media_type,
roles=granule.roles))
item = Item(
id=f'granule_{granule_index}',
geometry=bbox_to_geometry([-180, -90, 180, 90]),
bbox=[-180, -90, 180, 90],
datetime=datetime(2020, 1, 1),
properties=None,
)
item.add_asset(
'input_data',
Asset(granule.url, media_type=granule.media_type, roles=granule.roles),
)
catalog.add_item(item)

return catalog

0 comments on commit e468c0a

Please sign in to comment.