Presigned URLs can become invalid in LakeFSLoader.load
when Unstructured is slow
#29130
Open
5 tasks done
Labels
🤖:bug
Related to a bug, vulnerability, unexpected error with an existing feature
Checked other resources
Example Code
Unfortunately this will require you to have a LakeFS configuration so it isn't that straightforward to reproduce (and may also depend on your specific LakeFS configuration)... but basically just call
ls_objects
withpresign=True
then wait a while... and then try to access one of the URLs (which is what theLakeFSLoader
does internally).Error Message and Stack Trace (if applicable)
Description
When loading lots of or large documents with the
LakeFSLoader
it is frequently the case that quite a bit of time passes between the call tols_objects
at line 109 and the call torequests.get
on line 172.This is because Unstructured can be very slow (insisting on "repairing" and OCRing perfectly good PDFs, for instance). The result is that the presigned URLs that LakeFS gives us (in the call to
ls_objects
) are no longer valid once we get around to actually accessing them.System Info
System Information
Package Information
Optional packages not installed
Other Dependencies
The text was updated successfully, but these errors were encountered: