Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add percentile valid points in get_stats() #644

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

vschaffn
Copy link
Contributor

@vschaffn vschaffn commented Jan 24, 2025

Resolves GlacioHack/xdem#679.

Description

  • Add percentil valid points statistic calculation, as the number of non-NaN values divided by the number of values in the Raster.
  • Add percentil valid points in the dict() returned by get_stats(), add aliases.
  • Add percentil valid points in test_stats() method in test_raster.
  • Remove rmse because it is not relevant to compute it on one DEM/Raster, as we already compute std deviation

@vschaffn vschaffn force-pushed the 679-valid_points_stat branch from 2b4958b to 5368572 Compare January 24, 2025 10:18
@rhugonnet
Copy link
Member

Nice addition, and good catch on the redundancy of the RMSE in this case! 😄

Two small remarks:

Last thought: I didn't see any function to calculate the valid points in the changes, maybe it's not there yet! In that case I would recommend np.count_nonzero applied to np.isfinite, instead of ~np.isnan, as the latter considers only NaNs but not +/- infinity that are often unusable in stats (and I've had some misadventures with those being propagated in raster data before!).

@adehecq
Copy link
Member

adehecq commented Jan 29, 2025

Good addition!
Two other thoughts:

  • I believe your implementation does not exclude masked values. Remember that self.data is a masked array, so invalid pixels are masked instead of being set to NaN. One quick way to get total number of unmasked values is through self.data.compressed().size. Maybe it would be good to add 1-2 tests. For example, the "exploradores_aster_dem" example has data gaps.
  • the name of your variable is incorrect. It should not be "percentile" but "percentage".

Regarding @rhugonnet's comment:

It might be useful to some users to know the total point count (NaNs included)? If we report a "total count" and "valid count", then users can derive the percent of valid points themselves by dividing the two.

Not so easy to figure out which ones are the most useful between total count, valid count and fraction of valid pixels. It does not make so much sense to give all 3 as one can derive the 3rd from the 2 others... But I think the percentage of valid pixel is more useful than a count (which is generally gonna be large and not very convenient). Also, total count can be easily derived from the data shape... So I would go for either just the percentage of valid pixels, or percentage + total valid count.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add percent_valid_points statistic to the get_stats() method in the Raster class
3 participants