-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAS-1177: Add functions and CF overrides to create artificial bounds … #5
Changes from 7 commits
2244c71
02a6c37
71c39c1
fec7cf0
975861d
ca3bd74
7a872fd
6aa595a
b906d2a
e5f64d3
1086e14
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,18 @@ | ||
## v1.0.2 | ||
### 2024-2-26 | ||
|
||
This version of HOSS correctly handles edge-aligned geographic collections by | ||
adding the attribute `cell_alignment` with the value `edge` to `hoss_config.json` | ||
for edge-aligned collections (namely, ATL16), and by adding functions that | ||
create pseudo bounds for edge-aligned collections to make HOSS use the | ||
`dimension_utilities.py` function, `get_dimension_indices_from_bounds`. | ||
|
||
This change also includes an addition of a CF override that addresses an | ||
issue with the ATL16 metadata for the variables `/spolar_asr_obs_grid` and | ||
`/spolar_lorate_blowing_snow_freq` where their `grid_mapping` attribute points | ||
to north polar variables instead of south polar variables. This CF Override | ||
will have to be removed if/when the metadata is corrected. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: It's a bit of a strong statement to say it will have to be removed. It can be (and I think it would be good to do so), but it's unlikely that it would break anything by having an override that has the same information as the attribute it's overriding. |
||
|
||
## v1.0.1 | ||
### 2023-12-19 | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
1.0.1 | ||
1.0.2 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,14 +12,15 @@ | |
from logging import Logger | ||
from typing import Dict, Set, Tuple | ||
|
||
from pathlib import PurePosixPath | ||
from netCDF4 import Dataset | ||
from numpy.ma.core import MaskedArray | ||
import numpy as np | ||
|
||
from harmony.message import Message | ||
from harmony.message_utility import rgetattr | ||
from harmony.util import Config | ||
from varinfo import VarInfoFromDmr | ||
from varinfo import VarInfoFromDmr, VariableFromDmr | ||
|
||
from hoss.bbox_utilities import flatten_list | ||
from hoss.exceptions import InvalidNamedDimension, InvalidRequestedRange | ||
|
@@ -75,8 +76,149 @@ def prefetch_dimension_variables(opendap_url: str, varinfo: VarInfoFromDmr, | |
|
||
logger.info('Variables being retrieved in prefetch request: ' | ||
f'{format_variable_set_string(required_dimensions)}') | ||
return get_opendap_nc4(opendap_url, required_dimensions, output_dir, | ||
logger, access_token, config) | ||
|
||
required_dimensions_nc4 = get_opendap_nc4(opendap_url, | ||
required_dimensions, output_dir, | ||
logger, access_token, config) | ||
|
||
# Create bounds variables if necessary. | ||
add_bounds_variables(required_dimensions_nc4, required_dimensions, | ||
varinfo, logger) | ||
|
||
return required_dimensions_nc4 | ||
|
||
|
||
def add_bounds_variables(dimensions_nc4: str, | ||
required_dimensions: Set[str], | ||
varinfo: VarInfoFromDmr, | ||
logger: Logger) -> None: | ||
""" Augment a NetCDF4 file with artificial bounds variables for each | ||
dimension variable that is edge-aligned and does not already | ||
have bounds variables. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Total nitpick, but it might be worth adding to this first paragraph that the need for adding artificial bounds variables is identified by the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you mean that I should say we're making these bounds by adding the CF override to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the latter - i.e., There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I might slightly disagree with some of the text above. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But the "Augment a NetCDF4 file with artificial bounds variables for each dimension variable that has been identified by Or instead of (I may over-complicating this, I'm sorry!) |
||
|
||
For each dimension variable: | ||
(1) Check if the variable needs a bounds variable. | ||
(2) If so, create a bounds array from within the `write_bounds` function. | ||
(3) Then write the bounds variable to the NetCDF4 URL. | ||
|
||
""" | ||
owenlittlejohns marked this conversation as resolved.
Show resolved
Hide resolved
|
||
with Dataset(dimensions_nc4, 'r+') as prefetch_dataset: | ||
for dimension_name in required_dimensions: | ||
dimension_variable = varinfo.get_variable(dimension_name) | ||
if needs_bounds(dimension_variable): | ||
write_bounds(prefetch_dataset, dimension_variable) | ||
|
||
logger.info('Artificial bounds added for dimension variable: ' | ||
f'{dimension_name}') | ||
|
||
|
||
def needs_bounds(dimension: VariableFromDmr) -> bool: | ||
owenlittlejohns marked this conversation as resolved.
Show resolved
Hide resolved
|
||
""" Check if a dimension variable needs a bounds variable. | ||
This will be the case when dimension cells are edge-aligned | ||
and bounds for that dimension do not already exist. | ||
|
||
""" | ||
return ( | ||
dimension.attributes.get('cell_alignment') == 'edge' | ||
and dimension.references.get('bounds') is None | ||
) | ||
|
||
|
||
def get_bounds_array(prefetch_dataset: Dataset, | ||
dimension_path: str) -> np.ndarray: | ||
""" Create an array containing the minimum and maximum bounds | ||
for each pixel in a given dimension. | ||
|
||
The minimum and maximum values are determined under the assumption | ||
that the dimension data is monotonically increasing and contiguous. | ||
So for every bounds but the last, the bounds are simply extracted | ||
from the dimension dataset. | ||
|
||
The final bounds must be calculated with the assumption that | ||
the last data cell is edge-aligned and thus has a value the does | ||
not account for the cell length. So, the final bound is determined | ||
by taking the median of all the resolutions in the dataset to obtain | ||
a resolution that can be added to the final data value. | ||
|
||
Ex: Input dataset with resolution of 3 degrees: [ ... , 81, 84, 87] | ||
|
||
Minimum | Maximum | ||
<...> <...> | ||
81 84 | ||
84 87 | ||
87 ? -> 87 + median resolution -> 87 + 3 -> 90 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Brilliant example here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have a dumb question about edge-alignment. Is it always the case that when something is edge aligned, the values are such that they represent the left side of the gridcell? or in the case of a vertical y-positive axis dimension, the bottom? I'm probably confusing this with something else, but I can't find the reference at all. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know that it's always the case, but I think that it's what we've seen. I tried to find a quick reference, but couldn't. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For ATL16, the origin is bottom left. That's why. Most products are top left |
||
|
||
""" | ||
# Access the dimension variable's data using the variable's full path. | ||
dimension_array = prefetch_dataset[dimension_path][:] | ||
|
||
median_resolution = np.median(np.diff(dimension_array)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Doesn't the dimension array have constant differences between cells? If so, this is probably doing more math then we need. we can just compute the difference between any two cells. If not, then this won't be precisely accurate, is this "good enough" for subsetting that we're going for? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No, I'm pretty sure any type of edge alignment is possible. We decided to stick with handling the lower left alignment case since that's ATL16's alignment and accounting for all the possible alignments could balloon very quickly. I think we're thinking/hoping that this is the most common edge-alignment too. I might have just made some of that up though, what say you @owenlittlejohns? All that being said, this should probably be documented in the code, huh. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@autydp and @owenlittlejohns looked over this calculation with me, and they figured the cell differences might not always be constant. So yes, what I currently have here is a "good enough". They can weigh in though! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I might still argue that the (N-2, N-1) difference is a better predictor of the (N-1, N) difference than the entire median, but I will not push it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the main thing here is that there can be a bit of floating-point level variation. Maybe that's not super important for the very last pixel, but it's bitten us before at times for requests with bounding boxes hitting the exact boundary between pixels. My preference for the median diff comes from a combination of that, and that it's still all a pretty straightforward one-liner to get it with |
||
|
||
# This array is the transpose of what is required, just for easier assignment | ||
# of values (indices are [row, column]) during the bounds calculations: | ||
cell_bounds = np.zeros(shape=(2, dimension_array.size), dtype=dimension_array.dtype) | ||
|
||
# Minimum values are equal to the dimension pixel values (for lower left pixel alignment): | ||
cell_bounds[0] = dimension_array[:] | ||
|
||
# Maximum values are the next dimension pixel values (for lower left pixel alignment), | ||
# so these values almost mirror the minimum values but start at the second pixel | ||
# instead of the first. Here we calculate each bound except for the very last one. | ||
cell_bounds[1][:-1] = dimension_array[1:] | ||
|
||
# Last maximum value is the last pixel value (minimum) plus the median resolution: | ||
cell_bounds[1][-1] = dimension_array[-1] + median_resolution | ||
|
||
# Return transpose of array to get correct shape: | ||
return cell_bounds.T | ||
|
||
|
||
def write_bounds(prefetch_dataset: Dataset, | ||
dimension_variable: VariableFromDmr) -> None: | ||
""" Write the input bounds array to a given dimension dataset. | ||
|
||
First a new dimension is created for the new bounds variable | ||
to allow the variable to be two-dimensional. | ||
|
||
Then the new bounds variable is created using two dimensions: | ||
(1) the existing dimension of the dimension dataset, and | ||
(2) the new bounds variable dimension. | ||
|
||
""" | ||
# Create the bounds array. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nitpick: Feels like a superfluous comment. |
||
bounds_array = get_bounds_array(prefetch_dataset, | ||
dimension_variable.full_name_path) | ||
|
||
# Create the second bounds dimension. | ||
dimension_name = str(PurePosixPath(dimension_variable.full_name_path).name) | ||
dimension_group = str(PurePosixPath(dimension_variable.full_name_path).parent) | ||
bounds_name = dimension_name + '_bnds' | ||
bounds_full_path_name = dimension_variable.full_name_path + '_bnds' | ||
bounds_dimension_name = dimension_name + 'v' | ||
# Consider the special case when the dimension group is the root directory. | ||
# The dimension can't refer to the full path in the name itself, so we have | ||
# to create it with respect to the group we want to place it in. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (Totally just a) Nitpick: Maybe this comment could be truncated a tiny bit, and as it is specific to the first condition in the
|
||
if dimension_group == '/': | ||
bounds_dim = prefetch_dataset.createDimension(bounds_dimension_name, 2) | ||
else: | ||
bounds_dim = prefetch_dataset[dimension_group].createDimension(bounds_dimension_name, 2) | ||
|
||
# Dimension variables only have one dimension - themselves. | ||
variable_dimension = prefetch_dataset[dimension_variable.full_name_path].dimensions[0] | ||
|
||
bounds_data_type = str(dimension_variable.data_type) | ||
bounds = prefetch_dataset.createVariable(bounds_full_path_name, | ||
bounds_data_type, | ||
(variable_dimension, | ||
bounds_dim,)) | ||
|
||
# Write data to the new variable in the prefetch dataset. | ||
bounds[:] = bounds_array[:] | ||
|
||
# Update varinfo attributes and references. | ||
prefetch_dataset[dimension_variable.full_name_path].setncatts({'bounds': bounds_name}) | ||
dimension_variable.references['bounds'] = {bounds_name, } | ||
dimension_variable.attributes['bounds'] = bounds_name | ||
owenlittlejohns marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
def is_dimension_ascending(dimension: MaskedArray) -> bool: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -245,6 +245,20 @@ | |
], | ||
"_Description": "Ensure variables in /Soil_Moisture_Retrieval_Data_Polar_PM group point to correct coordinate variables." | ||
}, | ||
{ | ||
"Applicability": { | ||
"Mission": "ICESat2", | ||
"ShortNamePath": "ATL16", | ||
"Variable_Pattern": ".*_grid_(lat|lon)" | ||
}, | ||
"Attributes": [ | ||
{ | ||
"Name": "cell_alignment", | ||
"Value": "edge" | ||
} | ||
], | ||
"_Description": "ATL16 has edge-aligned grid cells." | ||
}, | ||
{ | ||
"Applicability": { | ||
"Mission": "ICESat2", | ||
|
@@ -357,6 +371,19 @@ | |
} | ||
], | ||
"_Description": "Ensure the latitude and longitude dimension variables know their associated grid_mapping variable." | ||
}, | ||
{ | ||
"Applicability": { | ||
"Mission": "ICESat2", | ||
"ShortNamePath": "ATL16", | ||
"Variable_Pattern": "/spolar_(asr_obs_grid|lorate_blowing_snow_freq)" | ||
}, | ||
"Attributes": [ | ||
{ | ||
"Name": "grid_mapping", | ||
"Value": "crs_latlon: spolar_grid_lat crs_latlon: spolar_grid_lon" | ||
} | ||
] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This entry addresses what we think is a mistake in the ATL16 v004 data for the two indicated variables, where the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the record, that grid_mapping value is referencing the crs_latlon grid-mapping variable for both horizontal dimensions (lat, lon), here explicitly identified in the grid_mapping reference value. A simpler entry would be, I think, to specify "crs_latlon" as a grid_mapping variable that implicitly applies to all dimensions. This posted override is in keeping with the other ATL16 grid_mapping reference attributes that explicitly call out the dimensional applicability. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Totally agree that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I almost wonder if HOSS should be a bit less "diligent" in reading those attributes and instead use other heuristics (likely already in use) for identifying the relevant horizontal dimensions for grid mapping references. They seem superfluous in this case, and then they got them wrong! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it HOSS reading those attributes, or varinfo? I saw that varinfo specifically checks the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point @lyonthefrog - it is |
||
} | ||
], | ||
"CF_Supplements": [ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that these are our public release notes, it might be worth explicitly stating in some way that the pseudo-bounds variables are not returned in the actual HOSS output.