Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataset: add RaCCooNS by Frank and Aumeistere #961

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

SiQube
Copy link
Member

@SiQube SiQube commented Feb 20, 2025

resolves #954

requires #989

Copy link

codecov bot commented Feb 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (cc72500) to head (c6de72c).

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #961   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           79        80    +1     
  Lines         3567      3587   +20     
  Branches       622       622           
=========================================
+ Hits          3567      3587   +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@SiQube SiQube enabled auto-merge (squash) February 20, 2025 14:07
@SiQube
Copy link
Member Author

SiQube commented Feb 21, 2025

there is additional raw gaze data. unfortunately, the files are ascii and we currently only support csv, tsv and feather data, see #962

@dkrako
Copy link
Contributor

dkrako commented Feb 25, 2025

Isn't this just documentation related? ToyDatasetEyeLink contains only ascii data and you should be able to load it without issues.

@SiQube
Copy link
Member Author

SiQube commented Feb 25, 2025

yes, but we have to have the right parsing criterions. I'd want to do this in a seperate PR and reopen an issue. maybe someone from the original authors can help?

@dkrako
Copy link
Contributor

dkrako commented Feb 25, 2025

Ah I see, it's not the EyeLink format then? can you provide some example lines?

@SiQube
Copy link
Member Author

SiQube commented Feb 27, 2025

the files are actually eyelink ascii files. however I am not entirely sure how trials are split and I don't want to propagate mistakes. maybe we can setup a meeting with Stefan Frank where we can discuss how the trials were split? CC @izaskr

@izaskr
Copy link
Collaborator

izaskr commented Feb 27, 2025

Alright, let me check. Will get back to you with this.

@SiQube
Copy link
Member Author

SiQube commented Mar 3, 2025

from @izaskr:

There's a variable TRIAL_INDEX that is increased by 1 at the end of each trial.
If you look for this in the text files you find the separations between consecutive trial.
I hope that answers David's question!

I'll implement it and then we can merge the dataset to pymovements.

@saeub
Copy link
Collaborator

saeub commented Mar 5, 2025

@SiQube I'll take care of the ASCII parsing (hopefully today)

@saeub saeub mentioned this pull request Mar 5, 2025
14 tasks
@saeub
Copy link
Collaborator

saeub commented Mar 5, 2025

I see three problems:

  1. Participant IDs in TSV files are numbers, but not all participant IDs in ASC filenames are in a number format (e.g. 001_2).
  2. Trial IDs in the ASC files are offset by 1 compared to the TSV files, meaning the user won't be able to join the event and gaze frames.
  3. Trial variable messages (!V TRIAL_VAR) are at the end of the trial, meaning we can't add info like stimulus ID to the gaze dataframe.

Regarding 2., I guess we can't really fix that right now? @SiQube (One option would be to allow passing functions with patterns in from_asc(), where a function takes the match.groupdict() and returns a dictionary of column values; but this would not be possible in a YAML definition, right?) For the moment, I just used a different column name trial_index0 in the gaze frame to make clear that it's a 0-based index.

For 3., I created an issue #990.

@saeub saeub disabled auto-merge March 5, 2025 20:26
@saeub saeub marked this pull request as draft March 5, 2025 20:27
@SiQube
Copy link
Member Author

SiQube commented Mar 5, 2025

maybe we can discuss with Frank and @izaskr ?

@dkrako
Copy link
Contributor

dkrako commented Mar 6, 2025

Regarding 2., I guess we can't really fix that right now? @SiQube (One option would be to allow passing functions with patterns in from_asc(), where a function takes the match.groupdict() and returns a dictionary of column values; but this would not be possible in a YAML definition, right?) For the moment, I just used a different column name trial_index0 in the gaze frame to make clear that it's a 0-based index.

I have the feeling that we will probably need some custom loading functions for specific datasets that we want to add in the future. This way we can stay flexible for cases like this where we need to postprocess data.

This is a bit of a bummer but we really can't avoid that as datasets vary in their data standards.
We can still keep DatasetDefinition classes for such cases. There we can add a custom_post_processing() method or so.
This way #914 wouldn't be blocked.

Also #352 (adding mat-file support) could benefit from custom pre/post processing functions.

@saeub
Copy link
Collaborator

saeub commented Mar 6, 2025

@SiQube Should we wait with merging this until we have a solution for parsing trial variables (#990) and custom postprocessing functions (#961 (comment))? Or do you want to merge this now and improve it later?

(I think the dataset is perfectly usable as it is now, just some of the information about the stimulus is missing, and the gaze and precomputed event frames don't match up.)

@saeub saeub marked this pull request as ready for review March 6, 2025 13:26
@SiQube
Copy link
Member Author

SiQube commented Mar 6, 2025

I don't mind merging it asap and fix it later => we can move one of the comments to a new issue. using pymovements for this dataset is still valuable (I think) since downloading and preprocessing works (for most of the data)

@dkrako
Copy link
Contributor

dkrako commented Mar 6, 2025

I'm in favor of merging this PR without the additional trial infos and improve on this later on when we have solved the underlying issues.

Also, adding a custom_post_processing() to a DatasetDefinition wouldn't help much when a user simply uses gaze.from_asc() to load a single file. #997 could help with this though.

@dkrako
Copy link
Contributor

dkrako commented Mar 6, 2025

We should probably also mention these issues in the docstring of the dataset.

Copy link
Contributor

@dkrako dkrako left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, lets merge this.

I just have two comments to make the classes clickable in the documentation (following #986).

@SiQube you can resolve the comments without changes in code if you like, as these probably aren't relevant anymore after merging #914


Examples
--------
Initialize your :py:class:`~pymovements.PublicDataset` object with the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PublicDataset does not exist anymore. Use

:py:class:`~pymovements.dataset.Dataset`

Examples
--------
Initialize your :py:class:`~pymovements.PublicDataset` object with the
:py:class:`~pymovements.RaCCooNS` definition:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use

:py:class:`~pymovements.datasets.RaCCooNS`

@dkrako dkrako enabled auto-merge (squash) March 6, 2025 14:31
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I forgot one thing: can you add a line for the dataset to docs/source/datasets/public_datasets.csv.

Copy link
Contributor

@dkrako dkrako left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a yaml definition?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add the RaCCooNS dataset
4 participants