Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Create a geometry iterator #319

Open
phargogh opened this issue Apr 21, 2023 · 5 comments
Open

Proposal: Create a geometry iterator #319

phargogh opened this issue Apr 21, 2023 · 5 comments
Labels
proposal Proposals requiring team feedback

Comments

@phargogh
Copy link
Member

phargogh commented Apr 21, 2023

In many different InVEST models, we have a similar pattern of iterating through all features in a layer and then doing something with that feature's geometry. But there is a great deal of variety with how we handle these geometries. One item previously described in natcap/invest#64 is that we don't consistently handle the case where a feature may not have a defined geometry, but there are other common manipulations done as well:

  • We sometimes skip a feature if a geometry is not present
  • We sometimes test the geometry's validity, doing geometry.buffer(0) if not valid
  • We sometimes only use the OGR geometry to build a shapely geometry
  • We sometimes split multipart geometries into individual geometries, such as MultiPoint --> Point.

I propose that we create a geometry iterator that handles the above common cases, and to provide a common entry point to more easily and consistently handle future cases. This is intended to resolve issues with geometries specifically, not to provide a consistent feature iterator. There are plenty of examples in InVEST where we iterate over features solely to create new geometries in a new layer or to only access field values. This proposal is not intended to change that pattern.

I was thinking of an interface that looked something like this:

def itergeometries(vector_path, layer_id=0, skip_missing=False, expand_multipart=False, repair_invalid=False, returns='ogr'):
    ...

Where:

  • skip_missing indicates how to handle missing geometries. If True, a logging.WARNING message is logged, but iteration continues. If False, an Exception is raised.
  • expand_multipart indicates whether multipart geometries should be expanded into their component geometries. If True, MultiPolygon geometry (for example), would yield multiple Polygon geometries. If False, a MultiPolygon geometry would yield a MultiPolygon geometry.
  • repair_invalid, if True, would cause some basic geometry repair (like buffer(0)) on any geometries that GEOS thinks are invalid.
  • returns indicates the target type, which would be either ogr or shapely.

Questions

  • Would this generator be better suited for pygeoprocessing?
  • Should this proposal be expanded into a more general feature iterator and not just be a geometry iterator?
  • Are there any other features that would be useful that have come up in recent work outside of InVEST?
  • Would it be useful to limit features returned to a bounding box or intersection geometry? Do we have enough use cases?
@davemfish
Copy link
Contributor

One common use-case we have for iterating geometries is to populate an rtree. I think the proposed interface would work well for that, but for some cases it's useful to index the rtree by fid. For example, if we want to get field values from a feature after querying the rtree.

So in that case we might want the return value to be something like (fid, geometry). But then it starts raising the question of whether we should just use a general feature iterator for this case. I imagine we want the benefits of the geometry manipulations though.

@phargogh
Copy link
Member Author

Other possibilities to include if we have enough use cases:

  • simplification
  • go back over the InVEST and pygeoprocessing use cases for this to make sure there aren't any others features/uses to roll in here
  • review simplification and whether that makes sense to include
  • transfer the issue to pgp
  • ping emily in the issue so she can take a look (more things can be raised)

sounds like a thumbs up and no design doc needed, just a PR.

@phargogh phargogh transferred this issue from natcap/invest Apr 25, 2023
@phargogh
Copy link
Member Author

@emlys do you have any thoughts or objections that you would like to add to this?

@dcdenu4 dcdenu4 added the proposal Proposals requiring team feedback label Apr 27, 2023
@emlys
Copy link
Member

emlys commented Apr 27, 2023

this sounds good to me! just a couple thoughts -

  • to me, it would be more intuitive to rename skip_missing to something like error_on_missing to clarify that it affects error behavior and not the contents of the iterator
  • returning the FID sounds useful and important, but it kind of conflicts with the expand_multipart option since each part would have the same FID. I think maybe I want both a feature iterator and a geometry iterator?

@phargogh
Copy link
Member Author

Also related, we have a similar issue over in InVEST: natcap/invest#1619. Would the Fiona API address/clarify/simplify the above proposal? We should look into how feature iteration is handled in Fiona, for sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal Proposals requiring team feedback
Projects
None yet
Development

No branches or pull requests

4 participants