Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve handling of symlinks in poll OR re-introduce get (or equivalent functionality) for polls #1298

Open
reidsunderland opened this issue Nov 15, 2024 · 10 comments
Labels
Discussion_Needed developers should discuss this issue. enhancement New feature or request

Comments

@reidsunderland
Copy link
Member

I'm mostly documenting that there is at least one case where get is useful, not necessarily saying we need to work on re-implementing it.

We just had a problem with a v2 SFTP poll, where we are polling a directory that contains many symlinks to files.

Previously, this directory presumably did not contain symlinks, so the poll worked fine. Sarra v2 would list the directory, the directory listing would report the actual file sizes, and everything worked.

The source made a change, so that the directory listing reports the size of the symlinks in the directory.

e.g.

ls /somedir/
100 file1
100 file2

Which is much smaller than the actual file sizes. I worked around this problem by migrating the v2 sarra to sr3 and turning on acceptSizeWrong.

If we do ls /somedir/*, the SFTP server would return the correct file sizes:

ls /somedir/*
23132 file1
8943 file2

An alternative solution would have been changing the v2 poll to use get * for each directory. But I didn't do this because I knew it wouldn't work with sr3.

I also tried path /somedir/* on sr3, but that fails because it tries to cd to that directory:

[WARNING] sarracenia.flowcb.poll cd sr_poll/cd: could not cd to directory /somedir/*
@reidsunderland reidsunderland added the Discussion_Needed developers should discuss this issue. label Nov 15, 2024
@petersilva
Copy link
Contributor

petersilva commented Nov 15, 2024

in sr3, the structure returned from sftp is SFTPAttributes... ... we can already test if it is a link...

had a code snippet here... but it's wrong... see later comment...

We could add logic, perhaps based on option follow_symlinks where, if we find a link, we either don't return it at all, or return followed type... (by doing a stat of the destination and returning that instead...)

@petersilva
Copy link
Contributor

might have to do that in the ls routine of the sftp driver though...

@petersilva petersilva added the enhancement New feature or request label Nov 15, 2024
@petersilva
Copy link
Contributor

This is where I meant:

try:
dir_attr = self.sftp.listdir_attr()
finally:
alarm_cancel()
for index in range(len(dir_attr)):
attr = dir_attr[index]
line = attr.__str__()
self.line_callback(line, attr)

we post process the SFTPAttributes records being looped there... and we can test for a symlink, and replace the record with the destination... we could do that either all the time, or conditionally on whether follow_symlink is set.

@petersilva
Copy link
Contributor

petersilva commented Nov 15, 2024

oh... and to say v2 works with an ascii listing... where sr3 works on the SFTPAttributes... we should see an 'l' at the beginning of the line to be able to deduce that it is a link, and do a similar thing... I think it is easier/more worthwhile to do based on SFTPAttributes in sr3.

@reidsunderland
Copy link
Member Author

At least on this server, the directory listing does show that the file is a symlink (lrwxrwxrwx) but doesn't show where the target is.

stat'ing each file is probably a good option, and we talked about implementing for HTTP #1131 polls already. But it would cause a lot more CPU usage and load on the remote server.

Maybe setting the size to 0 when we see a symlink is also a possibility?

@reidsunderland reidsunderland changed the title re-introduce get (or equivalent functionality) for polls re-introduce get (or equivalent functionality) for polls OR improve handling of symlinks in poll Nov 15, 2024
@petersilva
Copy link
Contributor

petersilva commented Nov 15, 2024

in sr3: the ls_attr call already present returns SFTPAttributes for every file in the directory... we don't need to stat again. In the SFTPAttributes record already returned is the same record returned by stat. There is already... the mode field which can be queried to find if it is a link... we do need to issue a readlink for each link we find if we want to find the destination... SFTP protocol has readlink...

in v2: the ls_attr is re-constructed into a string to return something that looks like what is returned by an FTP server... but the same sort of logic is there. it's just that v2 throws away the stat records and returns the string.

@reidsunderland
Copy link
Member Author

reidsunderland commented Nov 15, 2024

I was imagining that stat'ing the file symlink would give the stat of the actual file. But I guess we would need to do readlink, then stat the file path that is returned by readlink

@petersilva
Copy link
Contributor

options:

  • should confirm how sr_poll produces the message ... it should produce a "link" message and not one for a regular file (with the stat record for the link.) that would be the correct way to poll.
  • follow_symlinks is entirely un-implemented in sr_poll... should probably pay attention to that...
  • we can either create a link message... with link info... or a artificially create a file message by reading the link...
  • two relevant options... fileEvents: link, and self.o.follow_symlinks... figure out whether ...
    so if "fileEvents" does not contain links, and follow_symlinks=True... then link entries should be replaced by corresponding file data. <-- very likely the most commonly preferred behaviour.

weirdness:

  • if follow_symlinks is set... then documentation says we should return two messages... one for the link, and one for the file.... but the file might be in a completely different directory... so ls would end up returning files that are not in the directory... feels odd.
  • maybe the documentation is wrong? somebody should run a watch with a link and follow_symlinks set, and see what it posts...

@petersilva
Copy link
Contributor

currently poll is v2... these changes are too involved to do there... so likely want to test with sr3...

@petersilva
Copy link
Contributor

petersilva commented Nov 18, 2024

back to talking about get...

  • file globbing ... different grammar than regexes...
  • save client side processing and traffic by getting it done servers side (smaller ls results.)
  • v2 had get + directory ... whereas sr3 uses path.

if we just use path ... the what if someone does:


 path /a/*/hoho/*gw*

(ie. a path that is not within a single directory.) ... current logic uses cd to get to a directory, and then does ls (no args) within it... that logic would be broken by this... so some kind of refactor needed.

suggestion to use path (to establish baseline) and have a second argument for pattern within it. An alternate syntax...

  path a/1/hoho  *gw* 
  path a */hoho/*gw*

having the pattern as explicitly within a path (separate argument)

Another wrinkle.. the globbing pattern is a server-side grammar... need to know the server-side grammar to specify... e.g. I believe windows has a different globbing grammar.

Won't work on http... so API differences for different protocols.

@reidsunderland reidsunderland changed the title re-introduce get (or equivalent functionality) for polls OR improve handling of symlinks in poll improve handling of symlinks in poll OR re-introduce get (or equivalent functionality) for polls Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion_Needed developers should discuss this issue. enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants