Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: options for zonal_statistics rasterization algorithm #312

Open
emlys opened this issue Mar 22, 2023 · 8 comments
Open

Proposal: options for zonal_statistics rasterization algorithm #312

emlys opened this issue Mar 22, 2023 · 8 comments
Labels
proposal Proposals requiring team feedback

Comments

@emlys
Copy link
Member

emlys commented Mar 22, 2023

how it works now

zonal_statistics rasterizes aggregate polygons. GDAL provides two options:

  • ALL_TOUCHED=False (default): pixels are "burned in" to the raster if their centerpoint falls within the aggregate polygon
  • ALL_TOUCHED=True: pixels are "burned in" to the raster if they touch the aggregate polygon at all

zonal_statistics uses ALL_TOUCHED=False. Some aggregate polygons can end up with no pixels at all (if they don't happen to overlap any pixels' centerpoint). zonal_statistics handles this case like so:

for each polygon with no pixels:
    calculate the polygon's bounding box
    read in the raster data within that bounding box
    calculate stats based on that window of data

proposed changes

Add a kwarg all_touched=False to zonal_statistics. Pass this value to gdal.RasterizeLayer. Remove the special handling of unset polygons.

  • if all_touched=False, the results will generally be more similar to the current version, but it's possible that some polygons will have no stats calculated (if they don't overlap any pixel centerpoint).
  • if all_touched=True, it's guaranteed that all polygons within the defined area of the raster will have stats calculated.

benefits

  • simpler to describe & understand the expected behavior
  • consistent with GDAL options
  • probably faster (I haven't tested it, but it would eliminate a lot of lines of code)
  • avoid edge cases where the bounding box approximation could be very inaccurate

examples

image

@dcdenu4
Copy link
Member

dcdenu4 commented Mar 24, 2023

James shared a link for the gdal rasterization source: https://github.com/OSGeo/gdal/blob/89e3fc244652771ca4f8abc0a6fe5794e2201b26/alg/gdalrasterize.cpp#L759

@davemfish
Copy link
Contributor

@emlys
Copy link
Member Author

emlys commented Mar 24, 2023

We determined that this needs some more information to make a decision -

  • Test these cases and more with GDAL to see if it's actually doing what I think it's doing. Possible weirdness with Bresenham's line algorithm?
  • If ALL_TOUCHED=True, how do we handle the case where two disjoint polygons touch the same pixel?
  • How do other zonal statistics implementations work (QGIS, rasterstats)?

@dcdenu4 dcdenu4 added the proposal Proposals requiring team feedback label Apr 27, 2023
@davemfish
Copy link
Contributor

Here's another zonal stats library to keep an eye on. I think python bindings are in the works. https://github.com/isciences/exactextract

Their readme also includes a comparison of other implementations.

@davemfish
Copy link
Contributor

Dave shared some background from what QGIS is doing: https://github.com/qgis/QGIS/blob/d5626d92360efffb4b8085389c8d64072ef65833/src/analysis/vector/qgszonalstatistics.cpp#L266 Talked about in this SO post: https://gis.stackexchange.com/questions/276794/how-does-qgis-zonal-statistics-handle-partially-overlapping-pixels

Here's the interesting bit of how QGIS handles very small polygons relative to the grid size:

https://github.com/qgis/QGIS/blob/d5626d92360efffb4b8085389c8d64072ef65833/src/analysis/vector/qgszonalstatistics.cpp#L266

    if ( featureStats.count <= 1 )
    {
      //the cell resolution is probably larger than the polygon area. We switch to precise pixel - polygon intersection in this case
      statisticsFromPreciseIntersection( featureGeometry, offsetX, offsetY, nCellsX, nCellsY, cellsizeX, cellsizeY,
                                         rasterBBox, featureStats );
    }

@dcdenu4
Copy link
Member

dcdenu4 commented Sep 26, 2024

In our discussion today I think the general take aways were:

  • Moving forward with Emily's proposed solution
  • Open question: how to handle current implementation and historic context? Add as an optional way to run zonal stats?
  • Look into QGIS, SAGA, other implementations of rasterize and zonal stats and maybe have a companion info doc.

@phargogh
Copy link
Member

phargogh commented Sep 26, 2024

Also of interest for looking at different rasterization methods, this paper might be of interest: https://searchworks.stanford.edu/articles/edseee__edseee.9942350. I have not yet read this, so this might not end up being relevant at all.. This paper is for smooth surfaces, which isn't immediately applicable to our regular work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal Proposals requiring team feedback
Projects
None yet
Development

No branches or pull requests

4 participants