Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide better error handling for NA/NaN keys in value map in reclassify raster function #419

Closed
claire-simpson opened this issue Dec 5, 2024 · 7 comments · Fixed by #420

Comments

@claire-simpson
Copy link
Contributor

Currently reclassify_raster will fail if there is a NA key in the value map. This is because we are using np.isclose to check if any of the value_map keys are close to the nodata value, and isclose cannot handle NA values.

The current error message produced will be either TypeError: boolean value of NA is ambiguous (if trying to check pandas.NA against another value, as occurs when value map is created from a csv with an empty row value), or TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' (if comparing a value (including NA) to NA; possibly in other situations).

@claire-simpson
Copy link
Contributor Author

The difference in error messages for the numpy.isclose function arises from using different NumPy versions.

Earlier versions of InVEST (and Workbench) used NumPy 1.24, which produced the latter error when calling numpy.isclose(pandas.NA, 255.0). In contrast, the current version of InVEST uses NumPy 2.1, which generates the former error when comparing pandas.NA to a float.

@phargogh
Copy link
Member

phargogh commented Dec 6, 2024

isclose cannot handle NA values

Are we (and should we be) using the equal_nan=True flag for numpy.isclose() comparisons? And, related, will pandas.NA give us reliable, expected results when provided to numpy.isclose()? Hopefully there isn't too much of a behavioral difference between numpy 1.24 vs numpy 2.1 aside from the text of the exception.

@claire-simpson
Copy link
Contributor Author

Using equal_nan=True allows numpy NaN values to evaluate against each other as true. I can't see a scenario in which we would not want these to equal. However, if numpy.nan is a key in the value_map, an exception will be raised later in the function.

Regardless of how equal_nan is set, pandas.NA will cause np.isclose to throw some TypeError in both numpy versions.

For more context: The function is using numpy.isclose in the context of checking if values in value_map to reclassify are close to the raster NoData value derived using gdal. An NA value can become a key in the value_map when the value_map is built from a csv which contains an empty cell in the key column. To solve the numpy.isclose error, we could just ensure that empty cells are read in as numpy.nan values (which isclose can evaluate), however (as mentioned above) a numpy.nan key will cause an issue later in the function. Is it valuable for users to be able to reclassify NA/NaN values? Given that this issue has not come up before I'd imagine users are not trying to do this

@davemfish
Copy link
Contributor

Is it valuable for users to be able to reclassify NA/NaN values? Given that this issue has not come up before I'd imagine users are not trying to do this

I think this is the key question. For pygeoprocessing, the value_map is a dict so I don't think we should worry too much about how data from a CSV might end up being represented in the value_map.

I do think it's possible for a raster to have numpy.nan values in it, so I think it would be reasonable to support those as valid value_map entries and allow reclassification.

I don't think it's necessary to support None or pandas.NA or any other type that cannot possible be a pixel value. Throwing an exception on those seems fine. And I guess the question is, should we try to throw a friendlier exception, or just let the function error as it does now. Curious if others have opinions.

@davemfish
Copy link
Contributor

a numpy.nan key will cause an issue later in the function.

I'm not exactly sure where this comes up, and I think it's kind of a separate issue from the non-numeric cases (pandas.NA, None, etc), but the function is not currently using equal_nan in the is_close calls and it likely should be.

@davemfish
Copy link
Contributor

but the function is not currently using equal_nan in the is_close calls and it likely should be.

Or not, if the function is not intended to operate on floats.

@claire-simpson
Copy link
Contributor Author

Ok so the proposed solution as of now is to check for non-numeric keys in the value_map, right? Does it make sense to define a custom error class if a non-numeric key is detected? I think this would be beneficial if we intend to update the wrapper function in utils.py to catch this error (and provide a custom message similar to what this function is throwing now when there is an NA/None key). This would be similar to how ReclassificationMissingValuesError is used. On the other hand, if we are happy with how the reclassify_raster function in utils is operating currently (checking for NA/None keys up front before trying to run reclassify_raster) its probably not necessary for pygeoprocessing to contain a new error class for its own sake.

davemfish added a commit that referenced this issue Dec 19, 2024
Check for non-numeric keys in value_map in reclassify raster function #419
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants