Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Empty dataset not empty #300

Open
danielfromearth opened this issue Oct 31, 2024 · 3 comments
Open

[Bug] Empty dataset not empty #300

danielfromearth opened this issue Oct 31, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@danielfromearth
Copy link
Collaborator

danielfromearth commented Oct 31, 2024

Summary


An inappropriate fill value is set when creating an empty dataset copy. This results in failures of subsequent processing, because instead of the dataset being truly empty, there is a "valid" value in a data variable, instead of a true fill value.

Description of the problem


When there are no data points that match the requested spatiotemporal conditions, l2ss-py creates an empty dataset copy here. @ank1m and I discovered an edge case where a valid value is being placed in the new, copied variable, instead of the expected null or fill value. This occurred for the following "ground_pixel_quality_flag" variable, which notably has an integer type (int32) and has no declared '_FillValue' attribute:

Here is a screenshot showing the variable, in a TEMPO collection:
image

Since this variable, "support_data/ground_pixel_quality_flag", doesn't have a '_FillValue', l2ss-py tries to create an empty array using np.nan instead. But, because this variable is of type 'int32', it can't use np.nan!

Instead, the code raises a
RuntimeWarning: invalid value encountered in cast multiarray.copyto(a, fill_value, casting='unsafe')
and then defaults back to using a 0 instead of np.nan.

However, 0 is a valid value for this variable (see the valid_min and valid_max attributes in the above screenshot), so subsequent operations see a valid array, rather than an empty, or all-fill-value, array.

Impact


This causes a failure during the below service chain call, after the "Stitchee" service tries to determine whether the files coming from l2ss-py are empty. Stitchee considers the file as "not empty" here because the variable's single value is not a fill value or null.

Steps to reproduce


The following request currently fails: https://harmony.uat.earthdata.nasa.gov/C1262899916-LARC_CLOUD/ogc-api-coverages/1.0.0/collections/all/coverage/rangeset?forceAsync=true&granuleId=G1269044803-LARC_CLOUD%2CG1269044708-LARC_CLOUD%2CG1269044681-LARC_CLOUD%2CG1269044688-LARC_CLOUD%2CG1269044514-LARC_CLOUD%2CG1269044741-LARC_CLOUD%2CG1269044710-LARC_CLOUD%2CG1269044439-LARC_CLOUD%2CG1269044715-LARC_CLOUD%2CG1269044815-LARC_CLOUD%2CG1269044726-LARC_CLOUD%2CG1269044787-LARC_CLOUD%2CG1269044827-LARC_CLOUD%2CG1269044658-LARC_CLOUD%2CG1269044679-LARC_CLOUD%2CG1269044727-LARC_CLOUD&subset=lat(32.56485%3A42.82943)&subset=lon(-135.7248%3A-52.76692)&subset=time(%222024-08-02T00%3A00%3A00.000Z%22%3A%222024-08-02T10%3A39%3A37.000Z%22)&concatenate=true&skipPreview=true

Desired change


An appropriate fill or null value for each variable's dtype is used when creating an "empty" dataset.

I think that means the dataset copy in l2ss should either:

  • take into account the dtype, and use the appropriate default _FillValue for that dtype to begin with (such as from netCDF4.default_fillvals), or
  • catch the invalid type warning, and then determine an appropriate _FillValue
@danielfromearth
Copy link
Collaborator Author

danielfromearth commented Oct 31, 2024

Here is a visualization of the spatial and temporal attributes of the granule that is not being processed correctly for the above referenced Harmony request:

image

The requested time window is
min_time="2024-08-02T00:00:00.000Z",
max_time="2024-08-02T10:39:37.000Z"

The times in the granule are just after the request time window, so that is why no matches to the spatiotemporal conditions are found during the subsetter's processing.

@danielfromearth danielfromearth added the bug Something isn't working label Oct 31, 2024
@frankinspace
Copy link
Member

@danielfromearth Would this still be an issue if l2ss-py were to not return any files in these cases of no data? That is my understanding of the implications of the decision in https://bugs.earthdata.nasa.gov/browse/TRT-36 to return no files in cases of no data.

We'll be looking at changing the implementation of l2ss-py as part of #308 so not urgent.

@ank1m
Copy link

ank1m commented Jan 23, 2025

I think the request with empty file request will be handled on harmony level, and batchee will be called with the catalog that includes only items with "good" files (empty file will be excluded). If so, batchee/stitchee operation should not be affected. But we can certainly test it and change stitchee to follow TRT-36 regulations as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants