Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Recommended Chunk Shape doesn't take into account compound dtypes #1122

Open
2 tasks done
pauladkisson opened this issue Oct 29, 2024 · 6 comments · May be fixed by #1146
Open
2 tasks done

[Bug]: Recommended Chunk Shape doesn't take into account compound dtypes #1122

pauladkisson opened this issue Oct 29, 2024 · 6 comments · May be fixed by #1146
Labels

Comments

@pauladkisson
Copy link
Member

pauladkisson commented Oct 29, 2024

What happened?

Dev tests are failing: https://github.com/catalystneuro/neuroconv/actions/runs/11567195585/job/32197117750

Tracked this down to an update with hdmf that uncovered these lines: https://github.com/hdmf-dev/hdmf/blob/dev/src/hdmf/backends/hdf5/h5tools.py#L1427-L1435

And our chunking recommendation based on the data shape here: https://github.com/catalystneuro/neuroconv/blob/main/src/neuroconv/tools/nwb_helpers/_configuration_models/_base_dataset_io.py#L263

Notice how in hdmf, if the data has a compound dtype, the shape is (len(data),), but in neuroconv the shape is always get_data_shape(data).

This throws an error when they mismatch in the case of a Caiman pixel_mask, which is a compound dtype in NWB.

The initial solution that I came up with was to load the NWB schema to figure out if a dataset is compound or not, but I was having some trouble finding the right code to load in the schema...

Steps to Reproduce

n/a

Traceback

No response

Operating System

Linux

Python Executable

Conda

Python Version

3.9

Package Versions

No response

Code of Conduct

@pauladkisson
Copy link
Member Author

Update:
I figured out that I can use the builder's dtype (from io.get_builder(dataset)) to appropriately check for compound dtypes, BUT io.get_builder only works when the nwbfile is being read from disk -- it returns None when the nwbfile is in memory.

@pauladkisson
Copy link
Member Author

@rly, when you have a chance could you provide some guidance on this issue?

How can I get a builder from an in-memory nwbfile? Or, if that is too difficult, how can I get access to the schema for a given neurodata object?

@rly
Copy link
Contributor

rly commented Nov 22, 2024

@pauladkisson I do not totally follow what happened here. If I understand correctly, get_data_shape changed in HDMF and this broke something in neuroconv. What was the behavior of get_data_shape for compound data types before the change? Is this a bug in HDMF?

An in-memory (not yet written) NWB file has no builders until you call write. You can use HDMF internal functions to get the spec if you really need to:

import pynwb
manager = pynwb.get_manager()
manager.type_map.namespace_catalog.get_spec("core", "PlaneSegmentation").get_dataset("pixel_mask")

@pauladkisson
Copy link
Member Author

If I understand correctly, get_data_shape changed in HDMF and this broke something in neuroconv.

Well, it isn't get_data_shape per se but rather this block of code:

        # define the data shape
        if 'shape' in io_settings:
            data_shape = io_settings.pop('shape')
        elif hasattr(data, 'shape'):
            data_shape = data.shape
        elif isinstance(dtype, np.dtype) and len(dtype) > 1:  # check if compound dtype
            data_shape = (len(data),)
        else:
            data_shape = get_data_shape(data)

And I'm not sure how exactly, but some recent changes uncovered this block of code so that it runs (__list_fill__) instead of __scalar_fill__. I had tracked that PR down previously, but didn't write it down unfortunately.

Is this a bug in HDMF?

I don't think so.

@pauladkisson pauladkisson linked a pull request Nov 22, 2024 that will close this issue
@rly
Copy link
Contributor

rly commented Nov 22, 2024

Another approach that may be more robust and which does not require knowledge of the spec, but does require knowledge of the pynwb object attributes:

import pynwb
from pynwb.testing.mock.ophys import mock_PlaneSegmentation

manager = pynwb.get_manager()
ps = mock_PlaneSegmentation()
manager.type_map.get_map(ps).get_attr_spec("pixel_mask")

@pauladkisson
Copy link
Member Author

pauladkisson commented Dec 4, 2024

After our meeting, we agreed that the way forward should be to just build the nwbfile, grab the dtype from the builder, and then figure out a mapping between the builder and the neurodata_objects that they correspond to.

manager = pynwb.get_manager()
builder = manager.build(nwbfile)
# figure out how to map to the neurodata_object in the builder
dtype = neurodata_object_in_builder.dtype

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants