-
Notifications
You must be signed in to change notification settings - Fork 229
Implement gmt xarray BackendEntrypoint #3919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Initial implementation of a 'gmtread' xarray BackendEntrypoint for decoding NetCDF files! Following instructions at https://docs.xarray.dev/en/v2025.03.1/internals/how-to-add-new-backend.html on registering a backend. Only a minimal implementation for now to read kind=grid data.
So that GMTDataArray accessor knows the path to the original NetCDF file to retrieve correct metadata.
Use gmtread as default engine instead of netcdf4.
Default is still kind='grid', by allow use of kind='image' too.
Users will need to set the 'kind' parameter explicitly to 'grid' or 'image'.
Also set `kind='grid'` argument in _load_earth_relief_holes
Need to set a default value of None for 'decode_kind' it seems, from reading https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html#open-dataset
Because xr.core.types.ReadBuffer is not available until xarray 2024.11.0, xref pydata/xarray#9787
Partially reverts b9e3eb4
Probably best to split this into two PRs so that we have two changelog entries:
But please take a look at the current implementation, and suggest comments first. |
Sounds good to me.
Here are my questions:
|
Yes, we could do that actually, and
True, the GMT accessor loading is already in the BackendEntrypoint, so we might as well deprecate |
4 releases sounds good to me. |
pygmt/xarray_backend.py
Outdated
ext = Path(filename_or_obj).suffix | ||
except TypeError: | ||
return False | ||
return ext in {".grd", ".nc", ".tif"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about .tiff
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return ext in {".grd", ".nc", ".tif"} | |
return ext in {".grd", ".nc", ".tif", ".tiff"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, added .tiff
. Any other known extensions to add? E.g. based on https://docs.generic-mapping-tools.org/6.5/reference/file-formats.html#grid-files?
Co-authored-by: Dongdong Tian <[email protected]>
pygmt/io.py
Outdated
@@ -2,10 +2,14 @@ | |||
PyGMT input/output (I/O) utilities. | |||
""" | |||
|
|||
from typing import Literal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're going to deprecate load_dataarray
(xref: #3919 (comment)), we should revert the changes in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to change the default engine in load_dataarray to engine="gmt"
or no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the default engine to engine="gmt"
means that users have to set decode_kind
, so it's a breaking change. It makes little sense to introduce a breaking change to a deprecated function, so my answer is no.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pygmt/helpers/testing.py
Outdated
@@ -154,7 +154,7 @@ def load_static_earth_relief(): | |||
A grid of Earth relief for internal tests. | |||
""" | |||
fname = which("@static_earth_relief.nc", download="c") | |||
return load_dataarray(fname) | |||
return load_dataarray(fname, decode_kind="grid") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, with load_dataarray
deprecated, we should use xr.load_dataarray(fname, engine="gmt", decode_kind="grid")
instead, but it can be done in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Started draft PR at #3922, will cherrypick things across.
doc/api/index.rst
Outdated
@@ -173,6 +173,7 @@ Input/output | |||
.. autosummary:: | |||
:toctree: generated | |||
|
|||
GMTBackendEntrypoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it makes more sense to have a separate category like "Xarray Integration" and put both GMTBackendEntrypoint
and GMTDataArrayAccessor
in the category. Also, if #3854 is implemented, GMTDataArrayAccessor
no longer makes sense in the "Metadata" category.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was wondering about this too. I can put GMTBackendEntrypoint
in a new "Xarray Integration" section as you suggested, and then we'll remove the I/O section in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move GMTDataArrayAccessor
to the "Xarray Integration" section in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move
GMTDataArrayAccessor
to the "Xarray Integration" section in this PR?
I think do that in #3854. Hopefully this PR can be done before that.
Appears to be a sphinx thing. I found mention of a |
The xarray documentation also uses napoleon, but the xarray repr looks correct (e.g., https://docs.xarray.dev/en/stable/generated/xarray.DataArray.html#xarray.DataArray), but I don't find any special settings in the |
pygmt/xarray_backend.py
Outdated
Attributes: | ||
Conventions: CF-1.7 | ||
title: Produced by grdcut | ||
history: grdcut @earth_relief_01d_p -R-55/-47/-24/-10 -Gstatic_eart... | ||
description: Reduced by Gaussian Cartesian filtering (111.2 km fullwidt... | ||
actual_range: [190. 981.] | ||
long_name: elevation (m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A workaround for issue #3919 (comment)
Attributes: | |
Conventions: CF-1.7 | |
title: Produced by grdcut | |
history: grdcut @earth_relief_01d_p -R-55/-47/-24/-10 -Gstatic_eart... | |
description: Reduced by Gaussian Cartesian filtering (111.2 km fullwidt... | |
actual_range: [190. 981.] | |
long_name: elevation (m) | |
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put the ellipisis after the colon in 'Attributes:' line, and it seemed to remove the :ivar
from showing up (see commit f2446fc):
Only downside is the ellipsis shows as Attributes:...
, but should be acceptable.
pygmt/xarray_backend.py
Outdated
# `chunks` and `cache` DO NOT go here, they are handled by xarray | ||
) -> xr.Dataset: | ||
""" | ||
Backend open_dataset method used by Xarray in :py:func:`~xarray.open_dataset`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the fix in #3927, sphinx-autogen will generate a stub file for this method. The method documentation is not ideal, at least one "Parameters" section is needed.
pygmt/xarray_backend.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering moving this file to pygmt/xarray/backend.py
, and then potentially in #3854, we can move pygmt/accessors.py
to pygmt/xarray/accessors.py
? And if we do pandas accessors (xref #3854 (comment)) in the future, we could put it under pygmt/pandas/accessor.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
Xref #2620
pygmt/xarray/backend.py
Outdated
|
||
Internally, GMT uses the netCDF C library to read netCDF files, and GDAL for GeoTIFF | ||
and other raster formats. See also | ||
:gmt-docs:`reference/features.html#grid-file-format`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a few sentences explaining why users may want to read netCDF/GeoTIFF files via GMT, rather than netcdf4/rasterio, something like:
Compared to the "netcdf4"/"rasterio" engines, the "gmt" engine can read GMT remote files (file names starting with @
) directly and provides the GMTDataArray accessor .gmt
for easy access to GMT-specific features and metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, updated the description in commit acab8b2, please review. I didn't mention the netcdf4
/rasterio
engines directly, but just mentioned that the GMT engine works better with GMT remote files and features in general.
Co-authored-by: Dongdong Tian <[email protected]>
Looks pretty good to me. |
Will leave this up for review until end of the week-ish. |
Description of proposed changes
Allow 'gmt' to be used as an engine in
xarray.open_dataarray
andxarray.open_dataset
for decoding raster NetCDF (grid) or GeoTIFF (image) files!TODO:
kind="grid"
pygmt.io.load_dataarray()
useengine="gmt"
by defaultraster_kind="image"
References:
Implements idea in #3673 (comment)
Preview: https://pygmt-dev--3919.org.readthedocs.build/en/3919/api/generated/pygmt.GMTBackendEntrypoint.html
Reminders
make format
andmake check
to make sure the code follows the style guide.doc/api/index.rst
.Slash Commands
You can write slash commands (
/command
) in the first line of a comment to performspecific operations. Supported slash command is:
/format
: automatically format and lint the code