[Bug]: `write_recording()` fails with `NWBZarrIO` backend #202

alejoe91 · 2022-11-21T14:18:30Z

What happened?

When trying to write a recording object using the NWBZarrIO backend from hdmf-zarr, I get the following error related to the len of the SpikeInterfaceRecordingDataChunkIterator.

The same nwbfile does not produce any error when writing to HDF5.

@oruebel @rly I think this might be due to the way neuroconv handles the data iteration, but tagging you here in case it's a Zarr backend issue.

Steps to Reproduce

from pynwb import NWBFile
from hdmf_zarr.nwb import NWBZarrIO
from datetime import datetime
from neuroconv.tools.spikeinterface import write_recording

import spikeinterface.full as si


# create toy objects
rec, sort = si.toy_example()

# instantiate nwbfile
nwb_metadata = dict(session_description="toy_example", identifier="tpy", session_start_time=datetime.now())
nwbfile = NWBFile(**nwb_metadata)

# write recording
write_recording(rec, nwbfile=nwbfile)

# write to Zarr
with NWBZarrIO('test_recording_zarr.nwb', "w") as io:
    io.write(nwbfile)

Traceback

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf_zarr/backend.py:904, in ZarrIO.__list_fill__(self, parent, name, data, options)
    903 try:
--> 904     dset[:] = data  # If data is an h5py.Dataset then this will copy the data
    905 # For compound data types containing strings Zarr sometimes does not like wirting multiple values
    906 # try to write them one-at-a-time instead then

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/zarr/core.py:1373, in Array.__setitem__(self, selection, value)
   1372 else:
-> 1373     self.set_basic_selection(pure_selection, value, fields=fields)

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/zarr/core.py:1468, in Array.set_basic_selection(self, selection, value, fields)
   1467 else:
-> 1468     return self._set_basic_selection_nd(selection, value, fields=fields)

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/zarr/core.py:1772, in Array._set_basic_selection_nd(self, selection, value, fields)
   1770 indexer = BasicIndexer(selection, self)
-> 1772 self._set_selection(indexer, value, fields=fields)

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/zarr/core.py:1800, in Array._set_selection(self, indexer, value, fields)
   1799         value = np.asanyarray(value, like=self._meta_array)
-> 1800     check_array_shape('value', value, sel_shape)
   1802 # iterate over chunks in range

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/zarr/util.py:547, in check_array_shape(param, array, shape)
    546 if array.shape != shape:
--> 547     raise ValueError('parameter {!r}: expected array with shape {!r}, got {!r}'
    548                      .format(param, shape, array.shape))

ValueError: parameter 'value': expected array with shape (300000, 4), got ()

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
Cell In [22], line 2
      1 with NWBZarrIO('nwb-test-files/test_recording_zarr.nwb', "w") as io:
----> 2     io.write(nwbfile)

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf/utils.py:645, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    643 def func_call(*args, **kwargs):
    644     pargs = _check_args(args, kwargs)
--> 645     return func(args[0], **pargs)

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf_zarr/backend.py:151, in ZarrIO.write(self, **kwargs)
    149 """Overwrite the write method to add support for caching the specification"""
    150 cache_spec = popargs('cache_spec', kwargs)
--> 151 super(ZarrIO, self).write(**kwargs)
    152 if cache_spec:
    153     self.__cache_spec()

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf/utils.py:645, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    643 def func_call(*args, **kwargs):
    644     pargs = _check_args(args, kwargs)
--> 645     return func(args[0], **pargs)

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf/backends/io.py:51, in HDMFIO.write(self, **kwargs)
     49 container = popargs('container', kwargs)
     50 f_builder = self.__manager.build(container, source=self.__source, root=True)
---> 51 self.write_builder(f_builder, **kwargs)

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf/utils.py:645, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    643 def func_call(*args, **kwargs):
    644     pargs = _check_args(args, kwargs)
--> 645     return func(args[0], **pargs)

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf_zarr/backend.py:248, in ZarrIO.write_builder(self, **kwargs)
    246 f_builder, link_data, exhaust_dci = getargs('builder', 'link_data', 'exhaust_dci', kwargs)
    247 for name, gbldr in f_builder.groups.items():
--> 248     self.write_group(parent=self.__file,
    249                      builder=gbldr,
    250                      link_data=link_data,
    251                      exhaust_dci=exhaust_dci)
    252 for name, dbldr in f_builder.datasets.items():
    253     self.write_dataset(parent=self.__file,
    254                        builder=dbldr,
    255                        link_data=link_data,
    256                        exhaust_dci=exhaust_dci)

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf/utils.py:645, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    643 def func_call(*args, **kwargs):
    644     pargs = _check_args(args, kwargs)
--> 645     return func(args[0], **pargs)

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf_zarr/backend.py:283, in ZarrIO.write_group(self, **kwargs)
    281 if subgroups:
    282     for subgroup_name, sub_builder in subgroups.items():
--> 283         self.write_group(parent=group,
    284                          builder=sub_builder,
    285                          link_data=link_data,
    286                          exhaust_dci=exhaust_dci)
    288 datasets = builder.datasets
    289 if datasets:

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf/utils.py:645, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    643 def func_call(*args, **kwargs):
    644     pargs = _check_args(args, kwargs)
--> 645     return func(args[0], **pargs)

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf_zarr/backend.py:291, in ZarrIO.write_group(self, **kwargs)
    289 if datasets:
    290     for dset_name, sub_builder in datasets.items():
--> 291         self.write_dataset(parent=group,
    292                            builder=sub_builder,
    293                            link_data=link_data,
    294                            exhaust_dci=exhaust_dci)
    296 # write all links (haven implemented)
    297 links = builder.links

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf/utils.py:645, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    643 def func_call(*args, **kwargs):
    644     pargs = _check_args(args, kwargs)
--> 645     return func(args[0], **pargs)

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf_zarr/backend.py:736, in ZarrIO.write_dataset(self, **kwargs)
    734     self.__dci_queue.append(dataset=dset, data=data)
    735 elif hasattr(data, '__len__'):
--> 736     dset = self.__list_fill__(parent, name, data, options)
    737 else:
    738     dset = self.__scalar_fill__(parent, name, data, options)

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf_zarr/backend.py:908, in ZarrIO.__list_fill__(self, parent, name, data, options)
    905     # For compound data types containing strings Zarr sometimes does not like wirting multiple values
    906     # try to write them one-at-a-time instead then
    907     except ValueError:
--> 908         for i in range(len(data)):
    909             dset[i] = data[i]
    910 return dset

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf/data_utils.py:1028, in DataIO.__len__(self)
   1026 if not self.valid:
   1027     raise InvalidDataIOError("Cannot get length of data. Data is not valid.")
-> 1028 return len(self.data)

TypeError: object of type 'SpikeInterfaceRecordingDataChunkIterator' has no len()

Operating System

Linux

Python Executable

Python

Python Version

3.9

Package Versions

No response

Code of Conduct

I agree to follow this project's Code of Conduct
Have you ensured this bug was not already reported?

The text was updated successfully, but these errors were encountered:

CodyCBakerPhD · 2022-11-21T14:26:44Z

Looking through the error stack, I expect this to be an issue more with hdmf-zarr than with anything on NeuroConv. But will try reproducing with much simpler examples to be sure. Definitely something about how the DCI communicates with the final writing attempts that differs from vanilla HDMF.

Not related to the primary issue, but note a nice new convenience we've added for

nwb_metadata = dict(session_description="toy_example", identifier="tpy", session_start_time=datetime.now())
nwbfile = NWBFile(**nwb_metadata)

which can be done as

from nwbinspector.tools import make_minimal_nwbfile

nwbfile = make_minimal_nwbfile()

alejoe91 · 2022-11-21T14:28:29Z

Nice! Thanks for the tip! Should we move the discussion on to hdmf-zarr then?

CodyCBakerPhD · 2022-11-21T14:33:17Z

If you have the time (if not I'll try it later), try

(i) writing a TimeSeries directly to acquisition with the same data as the recording. Something like

nwbfile = make_minimal_nwbfile()
nwbfile.add_acquisition(TimeSeries(name="idk", data=recording.get_traces(), rate=1.))

if that works fine then

(ii) try the more fundamentally basic SliceableDataChunkIterator (this will end up in HDMF at some point, nothing specific about NeuroConv in how it's designed unlike the SIRecordingIterator)

from neuroconv.tools.hdmf import SliceableDataChunkIterator

nwbfile = make_minimal_nwbfile()
nwbfile.add_acquisition(TimeSeries(name="idk", data=SliceableDataChunkIterator(recording.get_traces()), rate=1.))

alejoe91 · 2022-11-21T14:33:45Z

I'll do it right now!

CodyCBakerPhD · 2022-11-21T14:35:05Z

If that second one gives a similar error then yes, hdmf-zarr needs to change something to be compatible with the existing data iteration tools.

I guess to be extra-sure, you could try

(iii) the previous iterator, not as efficient but it 'always works' (? we'll see...)

from hdmf.data_utils import DataChunkIterator

nwbfile = make_minimal_nwbfile()
nwbfile.add_acquisition(TimeSeries(name="idk", data=DataChunkIterator(recording.get_traces()), rate=1.))

alejoe91 · 2022-11-21T14:42:43Z

If you have the time (if not I'll try it later), try

(i) writing a TimeSeries directly to acquisition with the same data as the recording. Something like
nwbfile = make_minimal_nwbfile()
nwbfile.add_acquisition(TimeSeries(name="idk", data=recording.get_traces(), rate=1.))
if that works fine then

(ii) try the more fundamentally basic SliceableDataChunkIterator (this will end up in HDMF at some point, nothing specific about NeuroConv in how it's designed unlike the SIRecordingIterator)
from neuroconv.tools.hdmf import SliceableDataChunkIterator

nwbfile = make_minimal_nwbfile()
nwbfile.add_acquisition(TimeSeries(name="idk", data=SliceableDataChunkIterator(recording.get_traces()), rate=1.))

All three examples run fine (the DataChunkIterator is extremely slow!). So I guess it might be related to the SpikeInterfaceRecordingDataChunkIterator?

CodyCBakerPhD · 2022-11-21T14:47:01Z

So I guess it might be related to the SpikeInterfaceRecordingDataChunkIterator?

Can't quite conclude that just yet. Good to know the iterators work, though.

Ah... so the problem now is likely that we do a lot of wrapping of things in H5DataIO's in NeuroConv, but if you're using Zarr that is no longer necessary (what is the method of specifying compression options with the Zarr backend BTW?)

alejoe91 · 2022-11-21T14:52:39Z

For zarr you can use the following arguments:

compressor: any numcodecs compatible compressor
chunks: tuple with chunk dimensions
filters: list of other numcodecs objects that are run before the compressor

Not sure how this is wrapped in hdmf-zarr, we need to take a look

CodyCBakerPhD · 2022-11-21T15:00:10Z

chunks: tuple with chunk dimensions

Are these equivalent to HDF5 'chunks'? (~1 MB in size, our iterator buffers multiple chunks, 1 GB by default)

Not sure how this is wrapped in hdmf-zarr, we need to take a look

Cool, let me know - we're used to just specifying the string name + dict of options through the other wrapper, but if we need to specify a specific codec class we can do the string to class mapping ourselves in NeuroConv (and optionally, of course, allow the user to specify the class explicitly)

alejoe91 · 2022-11-21T16:15:13Z

chunks: tuple with chunk dimensions

Are these equivalent to HDF5 'chunks'? (~1 MB in size, our iterator buffers multiple chunks, 1 GB by default)

No, they are not in size, but rather in shape: e.g. (1000, 100) or (1000, None)

Not sure how this is wrapped in hdmf-zarr, we need to take a look

Cool, let me know - we're used to just specifying the string name + dict of options through the other wrapper, but if we need to specify a specific codec class we can do the string to class mapping ourselves in NeuroConv (and optionally, of course, allow the user to specify the class explicitly)

Ok sounds good. I'll keep testing different options in the next days! Thanks for the help @CodyCBakerPhD

oruebel · 2022-11-21T19:37:36Z

Based on the error, I think this is probably an issue with processing of the DataChunkIterator in hdmf-zarr. What is the data type of the data? I won't be able to look at this until later today. I created the following issue on hdmf-zarr as a reminder hdmf-dev/hdmf-zarr#43

alejoe91 · 2022-11-21T19:52:05Z

For this toy example it's float

CodyCBakerPhD · 2022-11-21T20:42:11Z

@oruebel @alejoe91 I just tested the example locally and determined it has nothing to do with the iterators - it's the fact that we automatically wrap all kinds of things in NeuroConv with H5DataIO which clearly has no meaning here and that's what was causing the length problem.

Fix in progress on #203 but might take some discussion as to how we want to properly go about it (could also factor into to a more general overhaul of our compression option specification, which we've wanted to do for a while)

As an aside, the resulting NWB file that I get out of this... is it supposed to look like a folder? This is my first time testing the ZARR backend

alejoe91 · 2022-11-21T22:02:14Z

Yep! Zarr is indeed a structured folder structure ;)

oruebel · 2022-11-22T05:10:20Z

the fact that we automatically wrap all kinds of things in NeuroConv with H5DataIO which clearly has no meaning here

That makes sense. Yes, ZarrIO does not know how to deal with H5DataIO. There is an equivalent ZarrDataIO class that works much in the same way as H5DataIO but some of the options, in particular for compressors, are different. A basic tutorial on ZarrDataIO is here: https://hdmf-zarr.readthedocs.io/en/latest/tutorials/plot_zarr_dataset_io.html#sphx-glr-tutorials-plot-zarr-dataset-io-py

For basic setups, e.g., chunking only, we could in principle think about having some form of a "translator" to take H5DataIO objects and translate them to ZarrDataIO, but I don't think that will be possible in all cases as the HDF5 probably has settings that Zarr does not support and vice versa.

Let me know if there is anything I can help with to resolve this issue.

h-mayorquin · 2024-03-13T20:57:24Z

We should close this after we merge:
#578

h-mayorquin · 2024-07-11T01:10:24Z

I am closing this as this now is supported through the backend configuration

alejoe91 added the bug label Nov 21, 2022

CodyCBakerPhD mentioned this issue Nov 21, 2022

Do not wrap data in H5DataIO if compression is None #203

Closed

oruebel mentioned this issue Nov 21, 2022

Investigate SpikeInterfaceRecordingDataChunkIterator I/O error hdmf-dev/hdmf-zarr#43

Closed

This was referenced Jan 3, 2023

Standardize option names #272

Closed

[Feature]: Standardizing Conversion Config Options #280

Closed

h-mayorquin closed this as completed Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: `write_recording()` fails with `NWBZarrIO` backend #202

[Bug]: `write_recording()` fails with `NWBZarrIO` backend #202

alejoe91 commented Nov 21, 2022 •

edited

Loading

CodyCBakerPhD commented Nov 21, 2022

alejoe91 commented Nov 21, 2022

CodyCBakerPhD commented Nov 21, 2022

alejoe91 commented Nov 21, 2022

CodyCBakerPhD commented Nov 21, 2022

alejoe91 commented Nov 21, 2022

CodyCBakerPhD commented Nov 21, 2022

alejoe91 commented Nov 21, 2022

CodyCBakerPhD commented Nov 21, 2022

alejoe91 commented Nov 21, 2022

oruebel commented Nov 21, 2022

alejoe91 commented Nov 21, 2022

CodyCBakerPhD commented Nov 21, 2022

alejoe91 commented Nov 21, 2022

oruebel commented Nov 22, 2022

h-mayorquin commented Mar 13, 2024

h-mayorquin commented Jul 11, 2024

[Bug]: write_recording() fails with NWBZarrIO backend #202

[Bug]: write_recording() fails with NWBZarrIO backend #202

Comments

alejoe91 commented Nov 21, 2022 • edited Loading

What happened?

Steps to Reproduce

Traceback

Operating System

Python Executable

Python Version

Package Versions

Code of Conduct

CodyCBakerPhD commented Nov 21, 2022

alejoe91 commented Nov 21, 2022

CodyCBakerPhD commented Nov 21, 2022

alejoe91 commented Nov 21, 2022

CodyCBakerPhD commented Nov 21, 2022

alejoe91 commented Nov 21, 2022

CodyCBakerPhD commented Nov 21, 2022

alejoe91 commented Nov 21, 2022

CodyCBakerPhD commented Nov 21, 2022

alejoe91 commented Nov 21, 2022

oruebel commented Nov 21, 2022

alejoe91 commented Nov 21, 2022

CodyCBakerPhD commented Nov 21, 2022

alejoe91 commented Nov 21, 2022

oruebel commented Nov 22, 2022

h-mayorquin commented Mar 13, 2024

h-mayorquin commented Jul 11, 2024

[Bug]: `write_recording()` fails with `NWBZarrIO` backend #202

[Bug]: `write_recording()` fails with `NWBZarrIO` backend #202

alejoe91 commented Nov 21, 2022 •

edited

Loading