Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues using non-zero bin-overlaps #127

Open
caplinje-NOAA opened this issue Jul 2, 2024 · 4 comments
Open

Issues using non-zero bin-overlaps #127

caplinje-NOAA opened this issue Jul 2, 2024 · 4 comments
Assignees

Comments

@caplinje-NOAA
Copy link
Contributor

caplinje-NOAA commented Jul 2, 2024

I've run into an issue using bin overlaps in file class methods which call _bins, for example if attempting to do this:

file = pypam.AcuFile(sfile, AMAR, 1.)
rms1s = file.rms(binsize=1.0,bin_overlap=0.1)

In this case (i.e. using float bin_overlaps between 0 and 1), soundfile.blocks throws a type error:

TypeError: slice indices must be integers or None or have an __index__ method

I believe that maybe in the sf.blocks call in the _bins method (below), overlap should be set to 'noverlap', rather than 'bin_overlap', as the latter is the integer/sample representation of the fractional overlap.

for i, block in tqdm(enumerate(sf.blocks(self.file_path, blocksize=blocksize, start=self._start_frame,
                                         overlap=bin_overlap, always_2d=True, fill_value=0.0)),
                     total=n_blocks, leave=False, position=0):

Doing this solves the error (or at least the file.rms call doesn't error), however, the resulting dataset does not appear to show start_sample and end_sample values which I would expect, if there were some overlap. This is the resulting dataset if I replace bin_overlap with noverlap:

<xarray.Dataset>
Dimensions:       (id: 67, band: 1)
Coordinates:
  * id            (id) int32 0 1 2 3 4 5 6 7 8 9 ... 58 59 60 61 62 63 64 65 66
    datetime      (id) datetime64[ns] 2024-07-02T14:48:30.706957 ... 2024-07-...
    start_sample  (id) int32 0 64000 128000 192000 ... 4096000 4160000 4224000
    end_sample    (id) int32 64000 128000 192000 ... 4160000 4224000 4288000
  * band          (band) int32 0
    low_freq      (band) int32 0
    high_freq     (band) float64 3.2e+04
Data variables:
    rms           (id, band) float64 135.8 114.8 135.3 ... 137.0 113.5 137.3
Attributes: (12/14)
    file_path:               data/exfile.wav
    timezone:                UTC
    datetime_timezone:       UTC
    p_ref:                   1.0
    channel:                 0
    _start_frame:            0
                     ...
    fs:                      64000
    hydrophone_name:         AMAR 1
    hydrophone_model:        Geospectrum
    hydrophone_sensitivity:  -166.6
    hydrophone_preamp_gain:  0
    hydrophone_Vpp:          1

I appreciate any help here, and have really been enjoying learning this package.

@cparcerisas
Copy link
Collaborator

Hello! This should be fixed in the new branch feature/gridded_data
I am running some extra testing and will merge the pull request as soon as I am happy with it.
If you want to test that branch already that would be helpful!

ps. sorry for the delay in fixing this, I was taking a long break after my phd

@caplinje-NOAA
Copy link
Contributor Author

Thank you and no worry about the delay. I will test this out and compare it to my fix.

@cparcerisas
Copy link
Collaborator

Yeah I changed more things than that, I hope that is not a problem. It was a long time that I wanted to offer support for "gridded" data (each bin starts at second 0).
But there was indeed a mistake in the code for non-zero windows.
Now the _n_blocks _bins function have been changed, and should be correct. If you could check it I would really appreciate!

@caplinje-NOAA
Copy link
Contributor Author

I have committed my solution to a clean fork of your project here. I realize that you are addressing this in a different way (and entirely refactoring the _bins method), so I thought I would ask before I submitted a pull request. Also, apologies for not testing that branch yet, I am in the middle of some analysis and want to stick with what is working.

The fix is small but mostly is just replacing bin_overlap with noverlap in the sf.blocks call, but also corrects the time keeping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants