Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rosette Summary file name #204

Open
upsonp opened this issue Feb 22, 2024 · 8 comments
Open

Rosette Summary file name #204

upsonp opened this issue Feb 22, 2024 · 8 comments

Comments

@upsonp
Copy link
Contributor

upsonp commented Feb 22, 2024

I've come across an issue with yet again how DFO does things. I'm processing BTL/ROS files from what we call a 'fix station' and the files have next to no metadata. The BTL file processes fine, but when I try to use ctd.read.from_summary() I get an error on line 451 name = _basename(fname)[1]

The files don't have a * FileName = xxxx line so there is no file name.

The ctd.read.from_btl() function solves this on lines 278-280 with:

    if "name" not in metadata:
        name = _basename(fname)[1]
        metadata["name"] = str(name)

I'd like to do the same thing for the ctd.read.from_cnv() function, called by the ctd.read.from_summary() function if it's not an issue.

@ocefpaf
Copy link
Member

ocefpaf commented Feb 22, 2024

I'd like to do the same thing for the ctd.read.from_cnv() function, called by the ctd.read.from_summary() function if it's not an issue.

Yes please. We should have that kind of fallback in all read functions.

@upsonp
Copy link
Contributor Author

upsonp commented Feb 22, 2024

It's a bit more complicated to fix the issue for the ctd.read.from_edf() function, the only other function to call ctd.read._basename().

The real issue is we're mixing passing a file path and an io.StringIO object to each of the reader functions, but then later we just make the assumption a file path was passed in and then call _basename() to get the path, name and extension.

I've fixed my issue and included a fixstation_hl_02.ros file with a unit test using the solution I described above, but maybe a better solution would be to deal with the issue ctd.read._basename() function causes.

currently the function looks like:

def _basename(fname):
    """Return file name without path."""
    if not isinstance(fname, Path):
        fname = Path(fname)
    path, name, ext = fname.parent, fname.stem, fname.suffix
    return path, name, ext

And when an io.StringIO object is passed in the fname = Path(fname) line throws an exception booting us back out to wherever the _basename call came from.

We can solve this in a couple of ways.

  1. we rewrite the _basename function:
def _basename(fname):
    """Return file name without path."""
    if isinstance(fname, io.StringIO):
        return 'unknown', 'unknown', 'unknown'

    if not isinstance(fname, Path):
        fname = Path(fname)
    path, name, ext = fname.parent, fname.stem, fname.suffix
    return path, name, ext
  1. we can use a try-except block wherever _basename is called so that if for any reason an exception is thrown we handle the issue specific to the function calling _basename.

@ocefpaf
Copy link
Member

ocefpaf commented Feb 26, 2024

The real issue is we're mixing passing a file path and an io.StringIO object to each of the reader functions, but then later we just make the assumption a file path was passed in and then call _basename() to get the path, name and extension.

Ah yes. I forgot about that! Regarding a more permanent solution, we should trow a hash or some sort of unique ID instead of unknown, for the filename. The extension and path can be unknown IMO. What do you think?

@upsonp
Copy link
Contributor Author

upsonp commented Feb 26, 2024

So something like:

def _basename(fname):
    """Return file name without path."""
    if isinstance(fname, io.StringIO):
        return 'unknown', str(hash(fname)), 'unknown'

    if not isinstance(fname, Path):
        fname = Path(fname)
    path, name, ext = fname.parent, fname.stem, fname.suffix
    return path, name, ext

@ocefpaf
Copy link
Member

ocefpaf commented Feb 26, 2024

Looks good. I would restrict the hash to few characters though. Maybe something like:

import hashlib
hashlib.shake_256(fname).hexdigest(5)

@ocefpaf
Copy link
Member

ocefpaf commented Feb 26, 2024

BTW, you don't have to tackle that here. If you are OK with the solution for your case we can open an issue for this and solve it in another PR.

@upsonp
Copy link
Contributor Author

upsonp commented Feb 26, 2024

That would be preferable. 😄

ocefpaf added a commit that referenced this issue Feb 27, 2024
added unit tests and fix for issue #204
@ocefpaf
Copy link
Member

ocefpaf commented Feb 27, 2024

That would be preferable. 😄

Let's leave this issue open though b/c we want to solve that at some point. I merged your PR and will mint a release soon. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants