Skip to content

GH-10137: Cache session.list result in the AbstractInboundFileSynchronizer #10344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

artembilan
Copy link
Member

@artembilan artembilan commented Aug 21, 2025

Fixes: #10137

When maxFetchSize > 0, the AbstractInboundFileSynchronizer performs unnecessary Session.list() every time.
Then we go over all those files to filter.
And only after filtering we apply the maxFetchSize.

  • Add a cache for remote directory entries in the AbstractInboundFileSynchronizer
  • Guard cache interaction with a DefaultLockRegistry per directory
  • Rework AbstractInboundFileSynchronizer.transferFilesFromRemoteToLocal() to filter all the entries of the Session.list() on a first call. If maxFetchSize > 0, cache the remaining remote files

…InboundFileSynchronizer`

Fixes: spring-projects#10137

When `maxFetchSize > 0`, the `AbstractInboundFileSynchronizer` performs unnecessary
`Session.list()` every time.
Then we go over all those files to filter.
And only after filtering we apply the `maxFetchSize`.

* Add a cache for remote directory entries in the `AbstractInboundFileSynchronizer`
* Guard cache interaction with a `DefaultLockRegistry` per directory
* Rework `AbstractInboundFileSynchronizer.transferFilesFromRemoteToLocal()`
to filter all the entries of the `Session.list()` on a first call.
If `maxFetchSize > 0`, cache the remaining remote files
@artembilan artembilan requested a review from cppwfs August 21, 2025 15:50
@artembilan artembilan changed the title GH-10137: Cache session.list result in the `AbstractInboundFileSync… GH-10137: Cache session.list result in the AbstractInboundFileSynchronizer Aug 21, 2025
Copy link
Contributor

@cppwfs cppwfs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work.

Just the usual nitpicks.


@Override
protected boolean removeEldestEntry(Map.Entry<String, List<F>> eldest) {
return size() > 100;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want the 100 to be configurable?

@SuppressWarnings("unchecked")
Map<String, List<String>> fetchCache = TestUtils.getPropertyValue(sync, "fetchCache", Map.class);
List<String> cachedFiles = fetchCache.get("testRemoteDirectory");
assertThat(cachedFiles).containsExactly("bar", "baz");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are replacing foo. We may go ahead and replace bar and baz.

Starting with version 7.0, the `AbstractInboundFileSynchronizer` caches a filtered `Session.list(remoteDirectory)` after applying a `maxFetchSize` slicing.
The logic of the `AbstractInboundFileSynchronizer.transferFilesFromRemoteToLocal()` method is the following:

- If `maxFetchSize > 0`, the lock is acquired against `remoteDirecy` to avoid race condition from different threads, when work is done around cache.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remoteDirectory is misspelled.


- If `maxFetchSize > 0`, the lock is acquired against `remoteDirecy` to avoid race condition from different threads, when work is done around cache.
The performance degradation is minimal since all the later synchronizations deal only with in-memory cached leftover;
- If no cache entry for the `remoteDirecy`, the `Session.list(remoteDirectory)` is called and all returned remote files are filtered;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

- the filtered result then sliced by the `maxFetchSize`;
- then these file entries are being transferred to the local directory;
- the rest of filtered remote files are cached for later synchronizations;
- if there is a cache entry for the `remoteDirecy`, such a list is sliced `maxFetchSize` and iterated for the transfer to the local directory;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.

"such a list is sliced maxFetchSize" could be rewritten as " such a list is sliced to the maxFetchSize".

- then these file entries are being transferred to the local directory;
- the rest of filtered remote files are cached for later synchronizations;
- if there is a cache entry for the `remoteDirecy`, such a list is sliced `maxFetchSize` and iterated for the transfer to the local directory;
- if one of the transfers fails, the `filter` is reset from this remote file to the end of the filtered list.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"if one of the transfers fails, the filter is reset from this remote file "
can be rewritten as
"if one of the transfers fails, the filter is reset from from the failed remote file".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shouldn't list remote file server each time max-fetch-size is reached
2 participants