Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: raise helpful error message for existing event properties (#1041) #1046

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

xiaotiansu
Copy link
Contributor

@xiaotiansu xiaotiansu commented Mar 14, 2025

Hi, my first commit of code, hope I'm doing it correctly :)

Fixes #

Implemented changes

Insert a description of the changes implemented in the pull request.

  • raise ValueError if the computed property already exists in the event dataframe

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change is or requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.
Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration

  • passed 579 tests for dataset: tox -e py39 -- tests/unit/events

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

@github-actions github-actions bot added the enhancement New feature or request label Mar 14, 2025
Copy link

codecov bot commented Mar 14, 2025

Codecov Report

Attention: Patch coverage is 16.00000% with 21 lines in your changes missing coverage. Please review.

Project coverage is 99.42%. Comparing base (35f6b88) to head (310a946).

Files with missing lines Patch % Lines
src/pymovements/events/processing.py 9.52% 15 Missing and 4 partials ⚠️
src/pymovements/dataset/dataset.py 50.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##              main    #1046      +/-   ##
===========================================
- Coverage   100.00%   99.42%   -0.58%     
===========================================
  Files           80       80              
  Lines         3602     3626      +24     
  Branches       646      658      +12     
===========================================
+ Hits          3602     3605       +3     
- Misses           0       16      +16     
- Partials         0        5       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@dkrako dkrako left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thank you so much for your initiative!

Some of the tests don't pass because event_properties can also be a dict.
I posted a short proposal on how to change your code accordingly.

You will also need to add at least two test cases here:

@pytest.mark.parametrize(
('property_kwargs', 'exception', 'msg_substrings'),
[
pytest.param(
{'event_properties': 'foo'},
pm.exceptions.InvalidProperty,
('foo', 'invalid', 'valid', 'peak_velocity'),
id='invalid_property',
),
],
)
def test_event_dataframe_add_property_raises_exceptions(

One for the ValueError and one for the TypeError.

@dkrako dkrako mentioned this pull request Mar 17, 2025
19 tasks
elif isinstance(event_properties, dict):
event_property_names = event_properties.keys()
elif isinstance(event_properties, tuple):
event_property_names = event_properties[1].keys()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed, that it's probably much easier to reuse the properties of the initialized processor:

self.event_properties: list[tuple[str, dict[str, Any]]] = event_properties_with_kwargs

Here it is already guaranteed that it's a list of tuples, where the first item of each tuple is the name of the property and the second are optional keyword arguments.
Example: [('dispersion', {}), ('location', {'method': 'median'})]

So this way getting all the names is much easier:

event_property_names = [property[0] for property in processor.event_properties]

Sorry that I previously lead you to this more complicated code.


Returns
-------
Dataset
Returns self, useful for method cascading.
"""
if not isinstance(event_properties, (str, tuple, list, dict)):
Copy link
Contributor

@dkrako dkrako Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move the type check here to the EventGazeProcessor?

if isinstance(event_properties, (str, tuple)):
event_properties = [event_properties]
event_properties_with_kwargs = []
for event_property in event_properties:
if isinstance(event_property, str):
property_name = event_property
property_kwargs = {}
else:
property_name = event_property[0]
property_kwargs = event_property[1]

This way the checks are already in place if another function uses the processor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the event_properties argument is not that well documented.

Just for clarification, if event_properties is given just as a tuple,
then this form is expected: ('location', {'method': 'median'})

So the first if is for for the two ways of passing a single property, once simply as a string, once with additional kwargs.

If it's a list it would be nice to check that all items in the list are either str or a tuple.

All in all, it would be best to do the checking in the processor like this:

if isinstance(event_properties, str): 
     event_properties = [event_properties]
elif isinstance(event_properties, tuple):
     # here check first, that tuple has a length of two, else raise ValueError
     # then check that first item is str, second item dict, else raise TypeError
     event_properties = [event_properties]
elif isinstance(event_properties, list):
    # here we just check  each list item for correct types
    for event_property in event_properties:
        if not isinstance(event_property, (str, tuple)):
            raise TypeError(...)
        if isinstance(event_property, tuple):
            # here check first, that tuple has a length of two, else raise ValueError
            # then check that first item is str, second item dict, else raise TypeError
else:
    raise TypeError(...) 

This way the error messages will better guide confused users on how to correctly pass the properties.

Also this would then guarantee type safety in the following for loop, where the else would lead to confusing errors if users for example pass a list of dicts:

event_properties_with_kwargs = []
for event_property in event_properties:
if isinstance(event_property, str):
property_name = event_property
property_kwargs = {}
else:
property_name = event_property[0]
property_kwargs = event_property[1]

@@ -42,8 +42,46 @@ class EventProcessor:
"""

def __init__(self, event_properties: str | list[str]):
if isinstance(event_properties, str):
event_properties = [event_properties]
def __init__(self, event_properties: str | list[str]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line here is duplicated thus the init is actually never executed

@xiaotiansu xiaotiansu requested a review from dkrako March 24, 2025 10:34
Copy link
Contributor

@dkrako dkrako left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, this looks quite good already, but please make sure that all tests pass.

There are a few syntax errors left in your code, that's why the tests fail. The error messages in the test runs point you to the error: https://github.com/aeye-lab/pymovements/actions/runs/14032843428?pr=1046

raise ValueError('Tuple must have a length of 2.')
if not isinstance(event_properties[0], str):
raise TypeError(
f"First item of tuple must be a string, but received {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line leads to a syntax error on python 3.9-3.11. the curly braces must be on the same line for these python versions.

Try to format it like you wrote it in Dataset.compute_event_properties() then this should work.

@SiQube
Copy link
Member

SiQube commented Mar 24, 2025

this PR gets screwed over by autopep8, see hhatto/autopep8#765 for the issue. maybe we should deactivate autopep8?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants