Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSProcessing] Update GSProcessing custom split config parsing and documentation to match GConstruct. #1117

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

thvasilo
Copy link
Contributor

@thvasilo thvasilo commented Dec 19, 2024

NOTE: GSProcessing now requires list[str] for its custom split file config input.

Issue #, if available:

Description of changes:

  • Update GSProcessingdocumentation for custom split files to match implementation.
  • Make GSProcessing config parsing be equivalent to GConstruct by allowing only one of train/val/test to be defined. Previously GSProcessing required all, but GConstruct did not.
  • Make the custom file input for GSProcessing to require list of files. When possible we want GSProcessing input config to have one way to define things, allowing easier verification and reducing use confusion.
  • Also used GenAI to generate some tests for gsprocessing.config classes and functions that were previously untested.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@thvasilo thvasilo added break back compatibility ready able to trigger the CI gsprocessing For issues and PRs related the the GSProcessing library 0.4 labels Dec 19, 2024
@thvasilo thvasilo requested a review from jalencato December 19, 2024 01:56
@thvasilo thvasilo self-assigned this Dec 19, 2024
@thvasilo thvasilo added this to the 0.4 release milestone Dec 19, 2024
…figs whether they are single str or list[str]

NOTE: GSProcessing now requires list[str] for its custom split file config input.
@thvasilo thvasilo force-pushed the gsp-custom-split-configs branch from dba7f4d to 436a886 Compare December 19, 2024 02:17
graphstorm-processing/tests/test_converter.py Show resolved Hide resolved
entry_val = label_custom_split_filenames.get(entry, None)
if entry_val:
if isinstance(entry_val, str):
label_custom_split_filenames[entry] = [entry_val]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this way do we want to support string input for gsprocessing config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, since GConstruct supports both, we use this to convert to list[str] which is our expectation for GSProcessing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. But I suppose the code here only convert the gconstruct config, do we also support the config in gsprocessing config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer having a single way to do things to avoid confusing users. This goes along with the Python principle of "There should be one-- and preferably only one --obvious way to do it." https://peps.python.org/pep-0020/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.4 break back compatibility gsprocessing For issues and PRs related the the GSProcessing library ready able to trigger the CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants