You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An inappropriate fill value is set when creating an empty dataset copy. This results in failures of subsequent processing, because instead of the dataset being truly empty, there is a "valid" value in a data variable, instead of a true fill value.
Description of the problem
When there are no data points that match the requested spatiotemporal conditions, l2ss-py creates an empty dataset copy here. @ank1m and I discovered an edge case where a valid value is being placed in the new, copied variable, instead of the expected null or fill value. This occurred for the following "ground_pixel_quality_flag" variable, which notably has an integer type (int32) and has no declared '_FillValue' attribute:
Here is a screenshot showing the variable, in a TEMPO collection:
Since this variable, "support_data/ground_pixel_quality_flag", doesn't have a '_FillValue', l2ss-py tries to create an empty array using np.nan instead. But, because this variable is of type 'int32', it can't use np.nan!
Instead, the code raises a RuntimeWarning: invalid value encountered in cast multiarray.copyto(a, fill_value, casting='unsafe')
and then defaults back to using a 0 instead of np.nan.
However, 0 is a valid value for this variable (see the valid_min and valid_max attributes in the above screenshot), so subsequent operations see a valid array, rather than an empty, or all-fill-value, array.
Impact
This causes a failure during the below service chain call, after the "Stitchee" service tries to determine whether the files coming from l2ss-py are empty. Stitchee considers the file as "not empty" here because the variable's single value is not a fill value or null.
Steps to reproduce
The following request currently fails: https://harmony.uat.earthdata.nasa.gov/C1262899916-LARC_CLOUD/ogc-api-coverages/1.0.0/collections/all/coverage/rangeset?forceAsync=true&granuleId=G1269044803-LARC_CLOUD%2CG1269044708-LARC_CLOUD%2CG1269044681-LARC_CLOUD%2CG1269044688-LARC_CLOUD%2CG1269044514-LARC_CLOUD%2CG1269044741-LARC_CLOUD%2CG1269044710-LARC_CLOUD%2CG1269044439-LARC_CLOUD%2CG1269044715-LARC_CLOUD%2CG1269044815-LARC_CLOUD%2CG1269044726-LARC_CLOUD%2CG1269044787-LARC_CLOUD%2CG1269044827-LARC_CLOUD%2CG1269044658-LARC_CLOUD%2CG1269044679-LARC_CLOUD%2CG1269044727-LARC_CLOUD&subset=lat(32.56485%3A42.82943)&subset=lon(-135.7248%3A-52.76692)&subset=time(%222024-08-02T00%3A00%3A00.000Z%22%3A%222024-08-02T10%3A39%3A37.000Z%22)&concatenate=true&skipPreview=true
Desired change
An appropriate fill or null value for each variable's dtype is used when creating an "empty" dataset.
I think that means the dataset copy in l2ss should either:
take into account the dtype, and use the appropriate default _FillValue for that dtype to begin with (such as from netCDF4.default_fillvals), or
catch the invalid type warning, and then determine an appropriate _FillValue
The text was updated successfully, but these errors were encountered:
Here is a visualization of the spatial and temporal attributes of the granule that is not being processed correctly for the above referenced Harmony request:
The requested time window is
min_time="2024-08-02T00:00:00.000Z",
max_time="2024-08-02T10:39:37.000Z"
The times in the granule are just after the request time window, so that is why no matches to the spatiotemporal conditions are found during the subsetter's processing.
@danielfromearth Would this still be an issue if l2ss-py were to not return any files in these cases of no data? That is my understanding of the implications of the decision in https://bugs.earthdata.nasa.gov/browse/TRT-36 to return no files in cases of no data.
We'll be looking at changing the implementation of l2ss-py as part of #308 so not urgent.
I think the request with empty file request will be handled on harmony level, and batchee will be called with the catalog that includes only items with "good" files (empty file will be excluded). If so, batchee/stitchee operation should not be affected. But we can certainly test it and change stitchee to follow TRT-36 regulations as needed.
Summary
An inappropriate fill value is set when creating an empty dataset copy. This results in failures of subsequent processing, because instead of the dataset being truly empty, there is a "valid" value in a data variable, instead of a true fill value.
Description of the problem
When there are no data points that match the requested spatiotemporal conditions, l2ss-py creates an empty dataset copy here. @ank1m and I discovered an edge case where a valid value is being placed in the new, copied variable, instead of the expected null or fill value. This occurred for the following "ground_pixel_quality_flag" variable, which notably has an integer type (
int32
) and has no declared'_FillValue'
attribute:Here is a screenshot showing the variable, in a TEMPO collection:
![image](https://private-user-images.githubusercontent.com/114174502/382108806-163b1b21-fe4a-43b0-bc19-9cbb4baa8a92.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg4MTYxOTIsIm5iZiI6MTczODgxNTg5MiwicGF0aCI6Ii8xMTQxNzQ1MDIvMzgyMTA4ODA2LTE2M2IxYjIxLWZlNGEtNDNiMC1iYzE5LTljYmI0YmFhOGE5Mi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA2JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwNlQwNDI0NTJaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT04NzRhYzM0MDk5OTM2YzNiMDhiODlkYjczY2NjYTU2ZmU0MjZiN2EzMDFlNjJkYzZmMmM2NDlmOTcyNjRlOWZiJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.-0CFwVwdgqILVVbs9GqzfxFhg-W-IopIvuS_ecraZ8c)
Since this variable, "support_data/ground_pixel_quality_flag", doesn't have a
'_FillValue'
, l2ss-py tries to create an empty array usingnp.nan
instead. But, because this variable is of type'int32'
, it can't usenp.nan
!Instead, the code raises a
RuntimeWarning: invalid value encountered in cast multiarray.copyto(a, fill_value, casting='unsafe')
and then defaults back to using a
0
instead ofnp.nan
.However,
0
is a valid value for this variable (see thevalid_min
andvalid_max
attributes in the above screenshot), so subsequent operations see a valid array, rather than an empty, or all-fill-value, array.Impact
This causes a failure during the below service chain call, after the "Stitchee" service tries to determine whether the files coming from l2ss-py are empty. Stitchee considers the file as "not empty" here because the variable's single value is not a fill value or null.
Steps to reproduce
The following request currently fails:
https://harmony.uat.earthdata.nasa.gov/C1262899916-LARC_CLOUD/ogc-api-coverages/1.0.0/collections/all/coverage/rangeset?forceAsync=true&granuleId=G1269044803-LARC_CLOUD%2CG1269044708-LARC_CLOUD%2CG1269044681-LARC_CLOUD%2CG1269044688-LARC_CLOUD%2CG1269044514-LARC_CLOUD%2CG1269044741-LARC_CLOUD%2CG1269044710-LARC_CLOUD%2CG1269044439-LARC_CLOUD%2CG1269044715-LARC_CLOUD%2CG1269044815-LARC_CLOUD%2CG1269044726-LARC_CLOUD%2CG1269044787-LARC_CLOUD%2CG1269044827-LARC_CLOUD%2CG1269044658-LARC_CLOUD%2CG1269044679-LARC_CLOUD%2CG1269044727-LARC_CLOUD&subset=lat(32.56485%3A42.82943)&subset=lon(-135.7248%3A-52.76692)&subset=time(%222024-08-02T00%3A00%3A00.000Z%22%3A%222024-08-02T10%3A39%3A37.000Z%22)&concatenate=true&skipPreview=true
Desired change
An appropriate fill or null value for each variable's dtype is used when creating an "empty" dataset.
I think that means the dataset copy in l2ss should either:
netCDF4.default_fillvals
), orThe text was updated successfully, but these errors were encountered: