You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wanted to make sure the conversation we had in #770 wasn't lost, but felt it should be an issue separate from #738. When handling nodata comparisons for nan values across InVEST @davemfish and @emlys brought up some points about better type contracts and the possibility of handling nodata from user input up front in the models.
When thinking about nodata comparisons against LULC rasters and other arrays that should be of integer type:
@davemfish: But I'm starting to think the nicest solution might be a contract that all LULC rasters will be integer type and then simple == here. That contract might make a lot of things easier. This stuff & reclassifications.
Emily wrote a nice section that thinks about having a holistic nodata scheme that can be expected across InVEST models, see below.
I'd consider using nan for nodata to be kind of a poor practice. In that case it's a quirk of user-provided data that we can handle, but change right away. We could consider making an invariant like All arrays (outside of a certain utils function) will have a numeric nodata value.
Using a variation or wrapper around pygeoprocessing.raster_to_numpy_array that would reassign nan nodata values to an appropriate number.
a tangent
To extend that thought, what if the nodata value in an array was always a standard value associated with its dtype? Rather than defining different nodata values as module-level variables in each model. We could create all arrays such that the nodata value is determined by the dtype (like the dtype max value):
And then assume that's the nodata value, unless otherwise specified:
def array_equals_nodata(array, nodata=None):
if nodata is None:
nodata = numpy.iinfo(array.dtype).max
if numpy.issubdtype(array.dtype, numpy.integer):
return array == nodata
return numpy.isclose(array, nodata)
Then the invariant would be All arrays (outside a certain utils function) will have a nodata value equal to the max value for their dtype.
Related, I recently came across the numpy masked array module. I haven't compared its efficiency to our current strategy of using separate mask arrays, but it could be neat to try out.
This discussion was converted from issue #783 on May 04, 2023 16:55.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I wanted to make sure the conversation we had in #770 wasn't lost, but felt it should be an issue separate from #738. When handling nodata comparisons for
nan
values across InVEST @davemfish and @emlys brought up some points about better type contracts and the possibility of handling nodata from user input up front in the models.When thinking about nodata comparisons against LULC rasters and other arrays that should be of integer type:
Emily wrote a nice section that thinks about having a holistic nodata scheme that can be expected across InVEST models, see below.
Beta Was this translation helpful? Give feedback.
All reactions