dataset: add RaCCooNS by Frank and Aumeistere #961

SiQube · 2025-02-20T13:23:17Z

resolves #954

requires #989

codecov · 2025-02-20T13:27:27Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (cc72500) to head (c6de72c).

Additional details and impacted files

@@            Coverage Diff            @@
##              main      #961   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           79        80    +1     
  Lines         3567      3587   +20     
  Branches       622       622           
=========================================
+ Hits          3567      3587   +20

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

SiQube · 2025-02-21T13:58:30Z

there is additional raw gaze data. unfortunately, the files are ascii and we currently only support csv, tsv and feather data, see #962

dkrako · 2025-02-25T19:01:16Z

Isn't this just documentation related? ToyDatasetEyeLink contains only ascii data and you should be able to load it without issues.

SiQube · 2025-02-25T19:09:40Z

yes, but we have to have the right parsing criterions. I'd want to do this in a seperate PR and reopen an issue. maybe someone from the original authors can help?

dkrako · 2025-02-25T19:22:21Z

Ah I see, it's not the EyeLink format then? can you provide some example lines?

SiQube · 2025-02-27T14:56:27Z

the files are actually eyelink ascii files. however I am not entirely sure how trials are split and I don't want to propagate mistakes. maybe we can setup a meeting with Stefan Frank where we can discuss how the trials were split? CC @izaskr

izaskr · 2025-02-27T15:03:46Z

Alright, let me check. Will get back to you with this.

SiQube · 2025-03-03T08:20:43Z

from @izaskr:

There's a variable TRIAL_INDEX that is increased by 1 at the end of each trial.
If you look for this in the text files you find the separations between consecutive trial.
I hope that answers David's question!

I'll implement it and then we can merge the dataset to pymovements.

saeub · 2025-03-05T09:51:57Z

@SiQube I'll take care of the ASCII parsing (hopefully today)

saeub · 2025-03-05T19:31:27Z

I see three problems:

Participant IDs in TSV files are numbers, but not all participant IDs in ASC filenames are in a number format (e.g. 001_2).
Trial IDs in the ASC files are offset by 1 compared to the TSV files, meaning the user won't be able to join the event and gaze frames.
Trial variable messages (!V TRIAL_VAR) are at the end of the trial, meaning we can't add info like stimulus ID to the gaze dataframe.

Regarding 2., I guess we can't really fix that right now? @SiQube (One option would be to allow passing functions with patterns in from_asc(), where a function takes the match.groupdict() and returns a dictionary of column values; but this would not be possible in a YAML definition, right?) For the moment, I just used a different column name trial_index0 in the gaze frame to make clear that it's a 0-based index.

For 3., I created an issue #990.

SiQube · 2025-03-05T21:27:15Z

maybe we can discuss with Frank and @izaskr ?

dkrako · 2025-03-06T13:15:22Z

Regarding 2., I guess we can't really fix that right now? @SiQube (One option would be to allow passing functions with patterns in from_asc(), where a function takes the match.groupdict() and returns a dictionary of column values; but this would not be possible in a YAML definition, right?) For the moment, I just used a different column name trial_index0 in the gaze frame to make clear that it's a 0-based index.

I have the feeling that we will probably need some custom loading functions for specific datasets that we want to add in the future. This way we can stay flexible for cases like this where we need to postprocess data.

This is a bit of a bummer but we really can't avoid that as datasets vary in their data standards.
We can still keep DatasetDefinition classes for such cases. There we can add a custom_post_processing() method or so.
This way #914 wouldn't be blocked.

Also #352 (adding mat-file support) could benefit from custom pre/post processing functions.

saeub · 2025-03-06T13:26:33Z

@SiQube Should we wait with merging this until we have a solution for parsing trial variables (#990) and custom postprocessing functions (#961 (comment))? Or do you want to merge this now and improve it later?

(I think the dataset is perfectly usable as it is now, just some of the information about the stimulus is missing, and the gaze and precomputed event frames don't match up.)

SiQube · 2025-03-06T13:29:07Z

I don't mind merging it asap and fix it later => we can move one of the comments to a new issue. using pymovements for this dataset is still valuable (I think) since downloading and preprocessing works (for most of the data)

dkrako · 2025-03-06T13:29:18Z

I'm in favor of merging this PR without the additional trial infos and improve on this later on when we have solved the underlying issues.

Also, adding a custom_post_processing() to a DatasetDefinition wouldn't help much when a user simply uses gaze.from_asc() to load a single file. #997 could help with this though.

dkrako · 2025-03-06T13:41:10Z

We should probably also mention these issues in the docstring of the dataset.

dkrako

Alright, lets merge this.

I just have two comments to make the classes clickable in the documentation (following #986).

@SiQube you can resolve the comments without changes in code if you like, as these probably aren't relevant anymore after merging #914

dkrako · 2025-03-06T14:26:28Z

src/pymovements/datasets/raccoons.py

+
+    Examples
+    --------
+    Initialize your :py:class:`~pymovements.PublicDataset` object with the


PublicDataset does not exist anymore. Use

:py:class:`~pymovements.dataset.Dataset`

dkrako · 2025-03-06T14:26:41Z

src/pymovements/datasets/raccoons.py

+    Examples
+    --------
+    Initialize your :py:class:`~pymovements.PublicDataset` object with the
+    :py:class:`~pymovements.RaCCooNS` definition:


use

:py:class:`~pymovements.datasets.RaCCooNS`

dkrako · 2025-03-06T14:37:31Z

docs/source/bibliography.bib

Ah I forgot one thing: can you add a line for the dataset to docs/source/datasets/public_datasets.csv.

dkrako

can you add a yaml definition?

dataset: add RaCCooNS by Frank and Aumeistere

248ffc1

SiQube requested review from dkrako and prassepaul as code owners February 20, 2025 13:23

github-actions bot added the dataset label Feb 20, 2025

SiQube added 2 commits February 20, 2025 14:30

update docs

79bce43

add gaze data

8758e69

SiQube enabled auto-merge (squash) February 20, 2025 14:07

Merge branch 'main' into RaCCooNS

7d47393

Fix encoding for precomputed events and reading measures

fe89ad0

saeub mentioned this pull request Mar 5, 2025

Add encoding argument to from_asc() #989

Merged

14 tasks

saeub added 2 commits March 5, 2025 21:18

Parse gaze files

e9acc5c

Merge remote-tracking branch 'upstream/main' into RaCCooNS

2b10779

saeub disabled auto-merge March 5, 2025 20:26

saeub marked this pull request as draft March 5, 2025 20:27

Merge branch 'main' into RaCCooNS

e36a568

saeub marked this pull request as ready for review March 6, 2025 13:26

Merge branch 'main' into RaCCooNS

c6de72c

dkrako mentioned this pull request Mar 6, 2025

support custom pre or post load functions for specific datasets #998

Open

dkrako approved these changes Mar 6, 2025

View reviewed changes

dkrako enabled auto-merge (squash) March 6, 2025 14:31

dkrako reviewed Mar 6, 2025

View reviewed changes

dkrako requested changes Mar 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset: add RaCCooNS by Frank and Aumeistere #961

dataset: add RaCCooNS by Frank and Aumeistere #961

SiQube commented Feb 20, 2025 •

edited by saeub

Loading

codecov bot commented Feb 20, 2025 •

edited

Loading

SiQube commented Feb 21, 2025

dkrako commented Feb 25, 2025

SiQube commented Feb 25, 2025

dkrako commented Feb 25, 2025

SiQube commented Feb 27, 2025

izaskr commented Feb 27, 2025

SiQube commented Mar 3, 2025

saeub commented Mar 5, 2025

saeub commented Mar 5, 2025 •

edited

Loading

SiQube commented Mar 5, 2025

dkrako commented Mar 6, 2025 •

edited

Loading

saeub commented Mar 6, 2025

SiQube commented Mar 6, 2025

dkrako commented Mar 6, 2025

dkrako commented Mar 6, 2025

dkrako left a comment

dkrako Mar 6, 2025

dkrako Mar 6, 2025

dkrako Mar 6, 2025

dkrako left a comment

dataset: add RaCCooNS by Frank and Aumeistere #961

Are you sure you want to change the base?

dataset: add RaCCooNS by Frank and Aumeistere #961

Conversation

SiQube commented Feb 20, 2025 • edited by saeub Loading

codecov bot commented Feb 20, 2025 • edited Loading

Codecov Report

SiQube commented Feb 21, 2025

dkrako commented Feb 25, 2025

SiQube commented Feb 25, 2025

dkrako commented Feb 25, 2025

SiQube commented Feb 27, 2025

izaskr commented Feb 27, 2025

SiQube commented Mar 3, 2025

saeub commented Mar 5, 2025

saeub commented Mar 5, 2025 • edited Loading

SiQube commented Mar 5, 2025

dkrako commented Mar 6, 2025 • edited Loading

saeub commented Mar 6, 2025

SiQube commented Mar 6, 2025

dkrako commented Mar 6, 2025

dkrako commented Mar 6, 2025

dkrako left a comment

Choose a reason for hiding this comment

dkrako Mar 6, 2025

Choose a reason for hiding this comment

dkrako Mar 6, 2025

Choose a reason for hiding this comment

dkrako Mar 6, 2025

Choose a reason for hiding this comment

dkrako left a comment

Choose a reason for hiding this comment

SiQube commented Feb 20, 2025 •

edited by saeub

Loading

codecov bot commented Feb 20, 2025 •

edited

Loading

saeub commented Mar 5, 2025 •

edited

Loading

dkrako commented Mar 6, 2025 •

edited

Loading