Skip to content

Ideas on refactoring the Session.virtualfile_in method #3836

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 6 tasks
seisman opened this issue Mar 6, 2025 · 0 comments
Open
2 of 6 tasks

Ideas on refactoring the Session.virtualfile_in method #3836

seisman opened this issue Mar 6, 2025 · 0 comments
Assignees
Labels
maintenance Boring but important stuff for the core devs
Milestone

Comments

@seisman
Copy link
Member

seisman commented Mar 6, 2025

The Session.virtualfile_in method is one of the most commonly used low-level API functions when wrapping GMT modules. Currently, its definition signature is as below:

pygmt/pygmt/clib/session.py

Lines 1768 to 1778 in 1791ca2

def virtualfile_in(
self,
check_kind=None,
data=None,
x=None,
y=None,
z=None,
extra_arrays=None,
required_z=False,
required_data=True,
):

Here are some ideas to refactor the method signature:

  1. Remove the extra_arrays parameter (Refer to
    Session.virtualfile_in: Deprecate the parameter 'extra_arrays'. Prepare and pass a dictionary of arrays instead (will be removed in v0.20.0) #3823 (comment) for more detailed reasoning). When a module takes more than 3 columns, we can build a dict of columns instead. This is currently WIP in Session.virtualfile_in: Deprecate the parameter 'extra_arrays'. Prepare and pass a dictionary of arrays instead (will be removed in v0.20.0) #3823.
  2. Remove required_z and add required_ncols (or mincols for a shorter name). In this way, we can check if a table input contains enough number of columns. required_z=True is equivalent to required_ncols=3. This is currently WIP in Session.virtualfile_in: Deprecate parameter 'required_z'. Use 'mincols' instead (will be removed in v0.20.0) #3369
  3. Rename required_data to required. After addressing points 1 and 2, the function definition will be:
 def virtualfile_in( 
     self, 
     check_kind=None, 
     data=None, 
     x=None, 
     y=None, 
     z=None, 
     mincols=2, 
     required_data=True, 
 ): 

I think it makes more sense to rename required_data to required.
4. In some wrappers (currently, plot/plot3d/text/legend/x2sys_cross), we need to call data_kind to decide the data kind. In this way, we already know the data kind before calling Session.virtualfile_in, so it's unnecessary to call data_kind and check if the kind is valid in Session.virtualfile_in. So, what about renaming check_kind to kind? Currently, check_kind can take two values, vector and raster. After renaming, the new kind parameter can accept the following values, in addition to "vector" and "raster"

"arg", "empty", "file", "geojson", "grid", "image", "matrix", "stringio", "vectors"

In this way, "vector" and "raster" can be thought of as two special/general kinds (but please note that there may be confusions for the vector and vectors kinds). Then in the Sesssion.virtualfile_in method, the related codes can be rewritten to something like:

if kind in {"vector", "raster"}:  # Two general kinds
    valid_kinds = ("file", "arg") if required_data is False else ("file",)
    if kind == "raster":
        valid_kinds += ("grid", "image")
    elif kind == "vector":
        valid_kinds += ("empty", "matrix", "vectors", "geojson")
    kind = data_kind(data)  # Determine the actual data kind
    if kind not in valid_kinds:
        msg = f"Unrecognized data type for {check_kind}: {type(data)}."
        raise GMTInvalidInput(msg)
  1. Reorder the parameters
    Originally posted by @weiji14 in Session.virtualfile_in: Deprecate the parameter 'extra_arrays'. Prepare and pass a dictionary of arrays instead (will be removed in v0.20.0) #3823 (comment)

If we're going to break compatibility of virtualfile_in for 4 minor versions, maybe it's a good time to rethink the parameter ordering (ties in with potential changes in #3369). Would it make sense for the signature to be something like:

def virtualfile_in(
    self,
    check_kind=None,
    required_data=True,
    required_z=False,
    data=None,
    x=None,
    y=None,
    z=None,
    **extra_arrays
)

? We could also mark some of those parameters as keyword only (https://peps.python.org/pep-0570/#keyword-only-arguments) to ensure future compatibility (i.e. prevent positional-only parameters/arguments in case we decide to change the order again).

With points 1-4 addressed, maybe the following parameter order is better?

> def virtualfile_in(
>     self,
>     data=None,
>     x=None,
>     y=None,
>     z=None,
>     kind=None,
>     required=True,
>     mincols=2,
> )

TODO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Boring but important stuff for the core devs
Projects
None yet
Development

No branches or pull requests

2 participants