Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates with new Files #77

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from
Draft

Updates with new Files #77

wants to merge 11 commits into from

Conversation

mavaylon1
Copy link
Contributor

No description provided.

@mavaylon1
Copy link
Contributor Author

@rly You (I believe) have local changes for command_line_interface.py that is somewhat reflected here. Can you make sure to add those in a PR.

),
object_name="ElectricalSeriesAp",
object_name="ElectricalSeries", # TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still a TODO item or can the comment be removed?

Copy link
Contributor

@oruebel oruebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I added a few additional minor changes, so I'll let @rly review and approve

@oruebel
Copy link
Contributor

oruebel commented Nov 21, 2024

It looks like one of the configuations may have an error. I'm seeing:

For parameters: 'https://dandi-api-staging-dandisets.s3.amazonaws.com/blobs/914/6aa/9146aa46-9c01-45be-9d2a-693e6a7bb778', 'ElectricalSeries', (slice(0, 30000, None), slice(0, 384, None))
             Traceback (most recent call last):
               File "/opt/conda/lib/python3.11/site-packages/asv_runner/server.py", line 179, in _run_server
                 _run(run_args)
               File "/opt/conda/lib/python3.11/site-packages/asv_runner/run.py", line 65, in _run
                 skip = benchmark.do_setup()
                        ^^^^^^^^^^^^^^^^^^^^
               File "/opt/conda/lib/python3.11/site-packages/asv_runner/benchmarks/time.py", line 80, in do_setup
                 result = Benchmark.do_setup(self)
                          ^^^^^^^^^^^^^^^^^^^^^^^^
               File "/opt/conda/lib/python3.11/site-packages/asv_runner/benchmarks/_base.py", line 632, in do_setup
                 setup(*self._current_params)
               File "/home/jovyan/nwb_benchmarks/src/nwb_benchmarks/benchmarks/time_remote_slicing.py", line 186, in setup
                 self.neurodata_object = get_object_by_name(nwbfile=self.nwbfile, object_name=object_name)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               File "/home/jovyan/nwb_benchmarks/src/nwb_benchmarks/core/_nwb_helpers.py", line 20, in get_object_by_name
                 raise ValueError(f"The specified object name ({object_name}) is not in the NWBFile.")
             ValueError: The specified object name (ElectricalSeries) is not in the NWBFile.

@oruebel

This comment was marked as duplicate.

@mavaylon1
Copy link
Contributor Author

Another error that I am seeing is for Zarr:


[62.50%] ··· time_remote_file_reading.NWBZarrFileReadBenchmark.time_read_zarr_nwbfile                                                                                                                                                                                       failed

[62.50%] ··· ============================================================== ========

                                         s3_url                                     

             -------------------------------------------------------------- --------

              s3://dandiarchive/zarr/2e8d0cb4-c5d4-4abc-88d8-2581c3cf7f5a/   failed 

              s3://dandiarchive/zarr/c8c6b848-fbc6-4f58-85ff-e3f2618ee983/   failed 

             ============================================================== ========

             For parameters: 's3://dandiarchive/zarr/2e8d0cb4-c5d4-4abc-88d8-2581c3cf7f5a/'

             Traceback (most recent call last):

               File "/opt/conda/lib/python3.11/site-packages/asv_runner/server.py", line 179, in _run_server

                 _run(run_args)

               File "/opt/conda/lib/python3.11/site-packages/asv_runner/run.py", line 72, in _run

                 result = benchmark.do_run()

                          ^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/asv_runner/benchmarks/_base.py", line 661, in do_run

                 return self.run(*self._current_params)

                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/asv_runner/benchmarks/time.py", line 165, in run

                 samples, number = self.benchmark_timing(

                                   ^^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/asv_runner/benchmarks/time.py", line 258, in benchmark_timing

                 timing = timer.timeit(number)

                          ^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/timeit.py", line 180, in timeit

                 timing = self.inner(it, self.timer)

                          ^^^^^^^^^^^^^^^^^^^^^^^^^^

               File "<timeit-src>", line 6, in inner

               File "/opt/conda/lib/python3.11/site-packages/asv_runner/benchmarks/time.py", line 90, in func

                 self.func(*param)

               File "/opt/conda/lib/python3.11/site-packages/asv_runner/benchmarks/mark.py", line 105, in wrapper

                 return func(*args, **kwargs)

                        ^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/asv_runner/benchmarks/mark.py", line 105, in wrapper

                 return func(*args, **kwargs)

                        ^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/asv_runner/benchmarks/mark.py", line 105, in wrapper

                 return func(*args, **kwargs)

                        ^^^^^^^^^^^^^^^^^^^^^

               File "/home/jovyan/nwb_benchmarks/src/nwb_benchmarks/benchmarks/time_remote_file_reading.py", line 273, in time_read_zarr_nwbfile

                 self.nwbfile, self.io = read_zarr_nwbfile(s3_url=s3_url, mode="r")

                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

               File "/home/jovyan/nwb_benchmarks/src/nwb_benchmarks/core/_streaming.py", line 264, in read_zarr_nwbfile

                 nwbfile = io.read()

                           ^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/hdmf/utils.py", line 668, in func_call

                 return func(args[0], **pargs)

                        ^^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/hdmf/backends/io.py", line 56, in read

                 f_builder = self.read_builder()

                             ^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/hdmf/utils.py", line 668, in func_call

                 return func(args[0], **pargs)

                        ^^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/hdmf_zarr/backend.py", line 1335, in read_builder

                 f_builder = self.__read_group(self.__file, ROOT_NAME)

                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/hdmf_zarr/backend.py", line 1400, in __read_group

                 sub_builder = self.__read_group(sub_group, sub_name)

                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/hdmf_zarr/backend.py", line 1400, in __read_group

                 sub_builder = self.__read_group(sub_group, sub_name)

                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/hdmf_zarr/backend.py", line 1409, in __read_group

                 self.__read_links(zarr_obj=zarr_obj, parent=ret)

               File "/opt/conda/lib/python3.11/site-packages/hdmf_zarr/backend.py", line 1428, in __read_links

                 target_name, target_zarr_obj = self.resolve_ref(link)

                                                ^^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/hdmf_zarr/backend.py", line 734, in resolve_ref

                 target_zarr_obj = self.__open_file_consolidated(store=source_file,

                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/hdmf_zarr/backend.py", line 496, in __open_file_consolidated

                 return zarr.open_consolidated(store=store,

                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/zarr/convenience.py", line 1334, in open_consolidated

                 store = normalize_store_arg(

                         ^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/zarr/storage.py", line 197, in normalize_store_arg

                 return normalize_store(store, storage_options, mode)

                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/zarr/storage.py", line 169, in _normalize_store_arg_v2

                 raise ValueError("storage_options passed with non-fsspec path")

             ValueError: storage_options passed with non-fsspec path

             

             For parameters: 's3://dandiarchive/zarr/c8c6b848-fbc6-4f58-85ff-e3f2618ee983/'

             Traceback (most recent call last):

               File "/opt/conda/lib/python3.11/site-packages/asv_runner/server.py", line 179, in _run_server

                 _run(run_args)

               File "/opt/conda/lib/python3.11/site-packages/asv_runner/run.py", line 72, in _run

                 result = benchmark.do_run()

                          ^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/asv_runner/benchmarks/_base.py", line 661, in do_run

                 return self.run(*self._current_params)

                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/asv_runner/benchmarks/time.py", line 165, in run

                 samples, number = self.benchmark_timing(

                                   ^^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/asv_runner/benchmarks/time.py", line 258, in benchmark_timing

                 timing = timer.timeit(number)

                          ^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/timeit.py", line 180, in timeit

                 timing = self.inner(it, self.timer)

                          ^^^^^^^^^^^^^^^^^^^^^^^^^^

               File "<timeit-src>", line 6, in inner

               File "/opt/conda/lib/python3.11/site-packages/asv_runner/benchmarks/time.py", line 90, in func

                 self.func(*param)

               File "/opt/conda/lib/python3.11/site-packages/asv_runner/benchmarks/mark.py", line 105, in wrapper

                 return func(*args, **kwargs)

                        ^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/asv_runner/benchmarks/mark.py", line 105, in wrapper

                 return func(*args, **kwargs)

                        ^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/asv_runner/benchmarks/mark.py", line 105, in wrapper

                 return func(*args, **kwargs)

                        ^^^^^^^^^^^^^^^^^^^^^

               File "/home/jovyan/nwb_benchmarks/src/nwb_benchmarks/benchmarks/time_remote_file_reading.py", line 273, in time_read_zarr_nwbfile

                 self.nwbfile, self.io = read_zarr_nwbfile(s3_url=s3_url, mode="r")

                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

               File "/home/jovyan/nwb_benchmarks/src/nwb_benchmarks/core/_streaming.py", line 264, in read_zarr_nwbfile

                 nwbfile = io.read()

                           ^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/hdmf/utils.py", line 668, in func_call

                 return func(args[0], **pargs)

                        ^^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/hdmf/backends/io.py", line 56, in read

                 f_builder = self.read_builder()

                             ^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/hdmf/utils.py", line 668, in func_call

                 return func(args[0], **pargs)

                        ^^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/hdmf_zarr/backend.py", line 1335, in read_builder

                 f_builder = self.__read_group(self.__file, ROOT_NAME)

                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/hdmf_zarr/backend.py", line 1400, in __read_group

                 sub_builder = self.__read_group(sub_group, sub_name)

                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/hdmf_zarr/backend.py", line 1400, in __read_group

                 sub_builder = self.__read_group(sub_group, sub_name)

                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/hdmf_zarr/backend.py", line 1409, in __read_group

                 self.__read_links(zarr_obj=zarr_obj, parent=ret)

               File "/opt/conda/lib/python3.11/site-packages/hdmf_zarr/backend.py", line 1428, in __read_links

                 target_name, target_zarr_obj = self.resolve_ref(link)

                                                ^^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/hdmf_zarr/backend.py", line 734, in resolve_ref

                 target_zarr_obj = self.__open_file_consolidated(store=source_file,

                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/hdmf_zarr/backend.py", line 496, in __open_file_consolidated

                 return zarr.open_consolidated(store=store,

                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/zarr/convenience.py", line 1334, in open_consolidated

                 store = normalize_store_arg(

                         ^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/zarr/storage.py", line 197, in normalize_store_arg

                 return normalize_store(store, storage_options, mode)

                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

               File "/opt/conda/lib/python3.11/site-packages/zarr/storage.py", line 169, in _normalize_store_arg_v2

                 raise ValueError("storage_options passed with non-fsspec path")

             ValueError: storage_options passed with non-fsspec path

I'll take a look

@oruebel
Copy link
Contributor

oruebel commented Nov 21, 2024

In HDMF-Zarr, we use the logic:

# determine default storage options to use when opening a file from S3
storage_options = {}
if isinstance(path, str) and path.startswith(("s3://")):
    storage_options = dict(anon=True)

to determine if we need to set the storage options. I think we probably need to add this in these two functions:

def read_zarr(s3_url: str, open_without_consolidated_metadata: bool = False) -> zarr.Group:
"""
Open a Zarr file from an S3 URL using the built-in fsspec support in Zarr.
Returns
-------
file : zarr.Group
The zarr.Group object representing the opened file
"""
if open_without_consolidated_metadata:
zarrfile = zarr.open(store=s3_url, mode="r", storage_options=dict(anon=True))
else:
zarrfile = zarr.open_consolidated(store=s3_url, mode="r", storage_options=dict(anon=True))
return zarrfile
def read_zarr_nwbfile(s3_url: str, mode: str) -> Tuple[pynwb.NWBFile, hdmf_zarr.NWBZarrIO]:
"""
Read a Zarr NWB file from an S3 URL using the built-in fsspec support in Zarr.
Note: `r-` indicated reading without consolidated metadata, while `r` indicated reading with consolidated.
`r` should only be used in a benchmark for files that actually have consolidated metadata available,
for files without consolidated metadata, `hdmf_zarr` automatically reads without consolidated
metadata if no consolidated metadata is present.
Returns
-------
NWBFile : pynwb.NWBFile
The remote NWBFile object.
NWBZarrIO : hdmf_zarr.NWBZarrIO
The open IO object used to open the file.
"""
io = hdmf_zarr.NWBZarrIO(s3_url, mode=mode, storage_options=dict(anon=True))
nwbfile = io.read()
return (nwbfile, io)

@mavaylon1
Copy link
Contributor Author

In HDMF-Zarr, we use the logic:


# determine default storage options to use when opening a file from S3

storage_options = {}

if isinstance(path, str) and path.startswith(("s3://")):

    storage_options = dict(anon=True)

to determine if we need to set the storage options. I think we probably need to add this in these two functions:

def read_zarr(s3_url: str, open_without_consolidated_metadata: bool = False) -> zarr.Group:
"""
Open a Zarr file from an S3 URL using the built-in fsspec support in Zarr.
Returns
-------
file : zarr.Group
The zarr.Group object representing the opened file
"""
if open_without_consolidated_metadata:
zarrfile = zarr.open(store=s3_url, mode="r", storage_options=dict(anon=True))
else:
zarrfile = zarr.open_consolidated(store=s3_url, mode="r", storage_options=dict(anon=True))
return zarrfile
def read_zarr_nwbfile(s3_url: str, mode: str) -> Tuple[pynwb.NWBFile, hdmf_zarr.NWBZarrIO]:
"""
Read a Zarr NWB file from an S3 URL using the built-in fsspec support in Zarr.
Note: `r-` indicated reading without consolidated metadata, while `r` indicated reading with consolidated.
`r` should only be used in a benchmark for files that actually have consolidated metadata available,
for files without consolidated metadata, `hdmf_zarr` automatically reads without consolidated
metadata if no consolidated metadata is present.
Returns
-------
NWBFile : pynwb.NWBFile
The remote NWBFile object.
NWBZarrIO : hdmf_zarr.NWBZarrIO
The open IO object used to open the file.
"""
io = hdmf_zarr.NWBZarrIO(s3_url, mode=mode, storage_options=dict(anon=True))
nwbfile = io.read()
return (nwbfile, io)

I'm not sure what you mean. Those two functions already have anon=True defined.

@mavaylon1
Copy link
Contributor Author

It looks like one of the configuations may have an error. I'm seeing:

For parameters: 'https://dandi-api-staging-dandisets.s3.amazonaws.com/blobs/914/6aa/9146aa46-9c01-45be-9d2a-693e6a7bb778', 'ElectricalSeries', (slice(0, 30000, None), slice(0, 384, None))
             Traceback (most recent call last):
               File "/opt/conda/lib/python3.11/site-packages/asv_runner/server.py", line 179, in _run_server
                 _run(run_args)
               File "/opt/conda/lib/python3.11/site-packages/asv_runner/run.py", line 65, in _run
                 skip = benchmark.do_setup()
                        ^^^^^^^^^^^^^^^^^^^^
               File "/opt/conda/lib/python3.11/site-packages/asv_runner/benchmarks/time.py", line 80, in do_setup
                 result = Benchmark.do_setup(self)
                          ^^^^^^^^^^^^^^^^^^^^^^^^
               File "/opt/conda/lib/python3.11/site-packages/asv_runner/benchmarks/_base.py", line 632, in do_setup
                 setup(*self._current_params)
               File "/home/jovyan/nwb_benchmarks/src/nwb_benchmarks/benchmarks/time_remote_slicing.py", line 186, in setup
                 self.neurodata_object = get_object_by_name(nwbfile=self.nwbfile, object_name=object_name)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               File "/home/jovyan/nwb_benchmarks/src/nwb_benchmarks/core/_nwb_helpers.py", line 20, in get_object_by_name
                 raise ValueError(f"The specified object name ({object_name}) is not in the NWBFile.")
             ValueError: The specified object name (ElectricalSeries) is not in the NWBFile.

Found the issue. Pushing fix.

@mavaylon1
Copy link
Contributor Author

In HDMF-Zarr, we use the logic:

# determine default storage options to use when opening a file from S3
storage_options = {}
if isinstance(path, str) and path.startswith(("s3://")):
    storage_options = dict(anon=True)

to determine if we need to set the storage options. I think we probably need to add this in these two functions:

def read_zarr(s3_url: str, open_without_consolidated_metadata: bool = False) -> zarr.Group:
"""
Open a Zarr file from an S3 URL using the built-in fsspec support in Zarr.
Returns
-------
file : zarr.Group
The zarr.Group object representing the opened file
"""
if open_without_consolidated_metadata:
zarrfile = zarr.open(store=s3_url, mode="r", storage_options=dict(anon=True))
else:
zarrfile = zarr.open_consolidated(store=s3_url, mode="r", storage_options=dict(anon=True))
return zarrfile
def read_zarr_nwbfile(s3_url: str, mode: str) -> Tuple[pynwb.NWBFile, hdmf_zarr.NWBZarrIO]:
"""
Read a Zarr NWB file from an S3 URL using the built-in fsspec support in Zarr.
Note: `r-` indicated reading without consolidated metadata, while `r` indicated reading with consolidated.
`r` should only be used in a benchmark for files that actually have consolidated metadata available,
for files without consolidated metadata, `hdmf_zarr` automatically reads without consolidated
metadata if no consolidated metadata is present.
Returns
-------
NWBFile : pynwb.NWBFile
The remote NWBFile object.
NWBZarrIO : hdmf_zarr.NWBZarrIO
The open IO object used to open the file.
"""
io = hdmf_zarr.NWBZarrIO(s3_url, mode=mode, storage_options=dict(anon=True))
nwbfile = io.read()
return (nwbfile, io)

@oruebel I believe I have a fix. That being said it is in resolve_ref, which we have updated with export that will be merged tomorrow. This "fix" is tape to get the file readable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants