-
Notifications
You must be signed in to change notification settings - Fork 1.6k
GH-3879: Add cache to optimize header match performance. #3934
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
GH-3879: Add cache to optimize header match performance. #3934
Conversation
Fixes: spring-projects#3879 Issue link: spring-projects#3879 What Add a LRU cache for pattern match of KafkaHeaderMapper. Why? To improve CPU usage used by pattern match of KafkaHeaderMapper. Commonly, many Kafka records in the same topic will have the same header name. Currently, Pattern Match has O(M*N) time complexity, where M is pattern length, N is String length. If results of patterns match are cached and KafkaHeaderMapper uses it, KafkaHeaderMapper can expect improvement in terms of CPU usage. Signed-off-by: Sanghyeok An <[email protected]>
Signed-off-by: Sanghyeok An <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one nit-pick.
I love everything else.
Thank you again!
spring-kafka/src/main/java/org/springframework/kafka/support/AbstractKafkaHeaderMapper.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Sanghyeok An <[email protected]>
spring-kafka/src/main/java/org/springframework/kafka/support/AbstractKafkaHeaderMapper.java
Outdated
Show resolved
Hide resolved
Right, my bad. The real problem that we always go against pattern matching when we could just cache the matching outcome to not go there any more if we got the same header name in the next record. So, it sounds like we got a 3 states: Does this makes sense? |
@artembilan
// 1.
if (this.headerMatchedCache.isMultiValuePattern(headerName)) {
return true;
}
// 2.
if (this.headerMatchedCache.isSingleValuePattern(headerName)) {
return false;
}
// In this step, it is 'dont map' state.
// In case the pattern matches multi-value
// 3.
this.headerMatchedCache.cacheAsMultiValueHeader(headerName);
...
// In case the pattern doesn't match multi-value
// 4.
this.headerMatchedCache.cacheAsSingleValueHeader(headerName);
// After this, it is not in 'dont map' state. The Also, This is the reason why I used two separate IMHO, the race condition that might occur in What do you think? |
I see now. It is breaking contract for the What do you think? |
@artembilan Currently, the method The method Also, in the context of Furthermore, I believe we should respect reserved header name for safety, such as To summarize
For these reasons, I think it would be better to keep the current algorithm. |
Right. I meant The rest of your analysis makes sense. On the other hand, I wonder if we can come up with a |
@artembilan I have considered introducing a As a result, even if we introduce a By the way, through our conversation, I came up with another idea. The idea is for each
For example, protected static class SimplePatternBasedHeaderMatcher implements HeaderMatcher {
private final ConcurrentLruCache<String, Boolean> matchedCache = new ConcurrentLruCache(1000, ...);
private final ConcurrentLruCache<String, Boolean> notMatchedCache = new ConcurrentLruCache(1000, ...);
public boolean matchHeader(String headerName) {
String header = headerName.toLowerCase(...);
if (matchedCache.contains(header) {
return true;
}
if (notmatchedCache.contains(header) {
return false;
}
// Keep the original logics. + add result to caches.
...
}
} When using a (2 bytes * 50) * 1000 entries = 100,000 bytes ≈ 100KB. Therefore, using this approach would require about 200KB of additional memory per mapper, but it could lead to a cleaner implementation. If my suggestion was heading in an unreasonable direction, I apologize. |
This is great conversation, @chickenchickenlove , and I appreciate all your thinking and patience with my stubbornness. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is what I'm thinking about this caching implementation:
AbstractKafkaHeaderMapper.zip
Sorry about zip file because that is not easy to show different parts of the class changes.
Pay attention that your current HeaderPatternMatchCache
is fully eliminated.
Signed-off-by: Sanghyeok An <[email protected]>
@artembilan Thanks for your patience !! 🙇♂️ |
spring-kafka/src/main/java/org/springframework/kafka/support/AbstractKafkaHeaderMapper.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Sanghyeok An <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Let’s here from @sobychacko for double checking since we have fought back and force with this feature for a while .
thank you again !
Thanks for your review!!! |
Fixes: #3879
Issue link: #3879
In previous discussion, we decided to use
LinkedHashMap
forLruCache
Implementation.However, I think that we missed concurrency of KafkaListenerContainer.
We can set concurrency when we create KafkaListener like
@KafkaListener(concurrency = 10, ....)
.But,
LinkedHashMap
is not thread-safe.So, I use
ConcurrentLruCache
provided byspring-framework
to ensure thread-safe.