Skip to content

Implement gmt xarray BackendEntrypoint #3919

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 43 commits into from
Apr 26, 2025
Merged

Conversation

weiji14
Copy link
Member

@weiji14 weiji14 commented Apr 17, 2025

Description of proposed changes

Allow 'gmt' to be used as an engine in xarray.open_dataarray and xarray.open_dataset for decoding raster NetCDF (grid) or GeoTIFF (image) files!

TODO:

  • Initial implementation supporting kind="grid"
  • Let pygmt.io.load_dataarray() use engine="gmt" by default
  • Support raster_kind="image"
  • Documentation improvements

References:

Implements idea in #3673 (comment)

Preview: https://pygmt-dev--3919.org.readthedocs.build/en/3919/api/generated/pygmt.GMTBackendEntrypoint.html

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst.
  • Write detailed docstrings for all functions/methods.
  • If wrapping a new module, open a 'Wrap new GMT module' issue and submit reasonably-sized PRs.
  • If adding new functionality, add an example to docstrings or tutorials.

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash command is:

  • /format: automatically format and lint the code

Initial implementation of a 'gmtread' xarray BackendEntrypoint for decoding NetCDF files! Following instructions at https://docs.xarray.dev/en/v2025.03.1/internals/how-to-add-new-backend.html on registering a backend. Only a minimal implementation for now to read kind=grid data.
@weiji14 weiji14 added the feature Brand new feature label Apr 17, 2025
So that GMTDataArray accessor knows the path to the original NetCDF file to retrieve correct metadata.
weiji14 added 11 commits April 17, 2025 14:30
Use gmtread as default engine instead of netcdf4.
Default is still kind='grid', by allow use of kind='image' too.
Users will need to set the 'kind' parameter explicitly to 'grid' or 'image'.
Also set `kind='grid'` argument in _load_earth_relief_holes
Because xr.core.types.ReadBuffer is not available until xarray 2024.11.0, xref pydata/xarray#9787
@weiji14
Copy link
Member Author

weiji14 commented Apr 17, 2025

Probably best to split this into two PRs so that we have two changelog entries:

  1. (feature) Implement gmtread xarray BackendEntrypoint
    • Changes to pyproject.toml, xarray_backend.py and tests/test_xarray_backend.py
  2. **Breaking** Set load_dataarray's default engine to engine="gmtread", and require decode_kind parameter to be set (e.g. to decode_kind='grid')
    • Changes to io.py, helpers/testing.py, datasets/samples.py, etc

But please take a look at the current implementation, and suggest comments first.

@weiji14 weiji14 added the needs review This PR has higher priority and needs review. label Apr 17, 2025
@seisman
Copy link
Member

seisman commented Apr 17, 2025

Probably best to split this into two PRs so that we have two changelog entries:

  1. (feature) Implement gmtread xarray BackendEntrypoint

    • Changes to pyproject.toml, xarray_backend.py and tests/test_xarray_backend.py
  2. Breaking Set load_dataarray's default engine to engine="gmtread", and require decode_kind parameter to be set (e.g. to decode_kind='grid')

    • Changes to io.py, helpers/testing.py, datasets/samples.py, etc

Sounds good to me.

But please take a look at the current implementation, and suggest comments first.

Here are my questions:

  1. Why not just use "gmt" as the engine name?
  2. The load_dataarray function is almost the same as the xr.load_dataarray function, except that load_dataarray also get the GMT accessor information. With the "gmtread" engine added, xr.load_dataarray("xxx.grd", engine="gmtread", decode_kind="grid") works exactly the same as load_dataarray("xxx.grd", decode_kind="grid"). So maybe we can deprecate load_dataarray?

@weiji14
Copy link
Member Author

weiji14 commented Apr 17, 2025

  1. Why not just use "gmt" as the engine name?

Yes, we could do that actually, and engine="gmt" will make more sense if we want to implement writing to a NetCDF driver later via xr.DataArray.to_netcdf.

  1. The load_dataarray function is almost the same as the xr.load_dataarray function, except that load_dataarray also get the GMT accessor information. With the "gmtread" engine added, xr.load_dataarray("xxx.grd", engine="gmtread", decode_kind="grid") works exactly the same as load_dataarray("xxx.grd", decode_kind="grid"). So maybe we can deprecate load_dataarray?

True, the GMT accessor loading is already in the BackendEntrypoint, so we might as well deprecate load_dataarray. Ok to do it in 2 releases (remove in v0.18.0) or 4 releases (remove in v0.20.0)?

@weiji14 weiji14 changed the title Implement gmtread xarray BackendEntrypoint Implement gmt xarray BackendEntrypoint Apr 17, 2025
@seisman
Copy link
Member

seisman commented Apr 17, 2025

True, the GMT accessor loading is already in the BackendEntrypoint, so we might as well deprecate load_dataarray. Ok to do it in 2 releases (remove in v0.18.0) or 4 releases (remove in v0.20.0)?

4 releases sounds good to me.

ext = Path(filename_or_obj).suffix
except TypeError:
return False
return ext in {".grd", ".nc", ".tif"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about .tiff?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return ext in {".grd", ".nc", ".tif"}
return ext in {".grd", ".nc", ".tif", ".tiff"}

Copy link
Member Author

@weiji14 weiji14 Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, added .tiff. Any other known extensions to add? E.g. based on https://docs.generic-mapping-tools.org/6.5/reference/file-formats.html#grid-files?

pygmt/io.py Outdated
@@ -2,10 +2,14 @@
PyGMT input/output (I/O) utilities.
"""

from typing import Literal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're going to deprecate load_dataarray(xref: #3919 (comment)), we should revert the changes in this file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to change the default engine in load_dataarray to engine="gmt" or no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the default engine to engine="gmt" means that users have to set decode_kind, so it's a breaking change. It makes little sense to introduce a breaking change to a deprecated function, so my answer is no.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've reverted changes to io.py in commit f722b47. So keeping default engine behaviour as before for the load_dataarray function that will be deprecated in #3922.

@@ -154,7 +154,7 @@ def load_static_earth_relief():
A grid of Earth relief for internal tests.
"""
fname = which("@static_earth_relief.nc", download="c")
return load_dataarray(fname)
return load_dataarray(fname, decode_kind="grid")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, with load_dataarray deprecated, we should use xr.load_dataarray(fname, engine="gmt", decode_kind="grid") instead, but it can be done in a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started draft PR at #3922, will cherrypick things across.

@@ -173,6 +173,7 @@ Input/output
.. autosummary::
:toctree: generated

GMTBackendEntrypoint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it makes more sense to have a separate category like "Xarray Integration" and put both GMTBackendEntrypoint and GMTDataArrayAccessor in the category. Also, if #3854 is implemented, GMTDataArrayAccessor no longer makes sense in the "Metadata" category.

Copy link
Member Author

@weiji14 weiji14 Apr 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was wondering about this too. I can put GMTBackendEntrypoint in a new "Xarray Integration" section as you suggested, and then we'll remove the I/O section in a separate PR.

Copy link
Member

@seisman seisman Apr 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move GMTDataArrayAccessor to the "Xarray Integration" section in this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move GMTDataArrayAccessor to the "Xarray Integration" section in this PR?

I think do that in #3854. Hopefully this PR can be done before that.

@seisman
Copy link
Member

seisman commented Apr 23, 2025

This PR almost looks good, except that the doctest doesn't look good (what's ivar?):
Screenshot from 2025-04-23 10-25-32

@weiji14
Copy link
Member Author

weiji14 commented Apr 23, 2025

This PR almost looks good, except that the doctest doesn't look good (what's ivar?):

Appears to be a sphinx thing. I found mention of a napoleon_use_ivar setting set here (added in #383) that converts the .. attribute:: directive, and because the docstring output has Attributes:, sphinx converts it to ivar 😅

@seisman
Copy link
Member

seisman commented Apr 23, 2025

This PR almost looks good, except that the doctest doesn't look good (what's ivar?):

Appears to be a sphinx thing. I found mention of a napoleon_use_ivar setting set here (added in #383) that converts the .. attribute:: directive, and because the docstring output has Attributes:, sphinx converts it to ivar 😅

The xarray documentation also uses napoleon, but the xarray repr looks correct (e.g., https://docs.xarray.dev/en/stable/generated/xarray.DataArray.html#xarray.DataArray), but I don't find any special settings in the conf.py file (https://github.com/pydata/xarray/blob/main/doc/conf.py).

Comment on lines 45 to 51
Attributes:
Conventions: CF-1.7
title: Produced by grdcut
history: grdcut @earth_relief_01d_p -R-55/-47/-24/-10 -Gstatic_eart...
description: Reduced by Gaussian Cartesian filtering (111.2 km fullwidt...
actual_range: [190. 981.]
long_name: elevation (m)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A workaround for issue #3919 (comment)

Suggested change
Attributes:
Conventions: CF-1.7
title: Produced by grdcut
history: grdcut @earth_relief_01d_p -R-55/-47/-24/-10 -Gstatic_eart...
description: Reduced by Gaussian Cartesian filtering (111.2 km fullwidt...
actual_range: [190. 981.]
long_name: elevation (m)
...

Copy link
Member Author

@weiji14 weiji14 Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put the ellipisis after the colon in 'Attributes:' line, and it seemed to remove the :ivar from showing up (see commit f2446fc):

image

Only downside is the ellipsis shows as Attributes:..., but should be acceptable.

# `chunks` and `cache` DO NOT go here, they are handled by xarray
) -> xr.Dataset:
"""
Backend open_dataset method used by Xarray in :py:func:`~xarray.open_dataset`.
Copy link
Member

@seisman seisman Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the fix in #3927, sphinx-autogen will generate a stub file for this method. The method documentation is not ideal, at least one "Parameters" section is needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering moving this file to pygmt/xarray/backend.py, and then potentially in #3854, we can move pygmt/accessors.py to pygmt/xarray/accessors.py? And if we do pandas accessors (xref #3854 (comment)) in the future, we could put it under pygmt/pandas/accessor.py?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good


Internally, GMT uses the netCDF C library to read netCDF files, and GDAL for GeoTIFF
and other raster formats. See also
:gmt-docs:`reference/features.html#grid-file-format`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a few sentences explaining why users may want to read netCDF/GeoTIFF files via GMT, rather than netcdf4/rasterio, something like:

Compared to the "netcdf4"/"rasterio" engines, the "gmt" engine can read GMT remote files (file names starting with @) directly and provides the GMTDataArray accessor .gmt for easy access to GMT-specific features and metadata.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, updated the description in commit acab8b2, please review. I didn't mention the netcdf4/rasterio engines directly, but just mentioned that the GMT engine works better with GMT remote files and features in general.

@seisman
Copy link
Member

seisman commented Apr 24, 2025

Looks pretty good to me.

@weiji14 weiji14 marked this pull request as ready for review April 24, 2025 04:18
@weiji14 weiji14 added final review call This PR requires final review and approval from a second reviewer and removed needs review This PR has higher priority and needs review. labels Apr 24, 2025
@weiji14
Copy link
Member Author

weiji14 commented Apr 24, 2025

Will leave this up for review until end of the week-ish.

@seisman seisman merged commit ecb933c into main Apr 26, 2025
30 of 32 checks passed
@seisman seisman deleted the backendentrypoint/gmtread branch April 26, 2025 01:21
@seisman seisman removed the final review call This PR requires final review and approval from a second reviewer label Apr 26, 2025
weiji14 added a commit that referenced this pull request Apr 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Brand new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants