-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-19346. ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent() #7187
Conversation
…ck with ConcurrentHashMap/putIfAbsent()
@ctrezzo: Could you take a look? thanks, |
💔 -1 overall
This message was automatically generated. |
...p-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java
Outdated
Show resolved
Hide resolved
...p-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java
Outdated
Show resolved
Hide resolved
+1 to @pjfanning's comments. In fact, in principle all you need is just one call to computeIfAbsent() which takes care of thread safety, concurrency, and at-most-once cost for the computation. No need to call contains() or get() in addition. The original commit is not correct in a concurrent situation FYI. |
...p-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java
Outdated
Show resolved
Hide resolved
Can you share some rationale or tabletop concurrent benchmark tests to show the improvements with the new approach from the old? I do think the ConcurrentHashMap is a fine implementation, but I'm curious what led you to look into the ReadWriteLock implementation. |
We are running a hadoop version without a lock protecting the InnerCache.get() and basically rediscovered this bug, after two months' investigation. I was about to backport the ReadWriteLock approach from trunk. Then, a team member suggested that we could just use ConcurrentHashMap here, which provides a more clean approach than the current ReadWriteLock approach. Performance-wise, I think they should be similar? Only one thread will be allowed to create the new FileSystem instance for each missing key in both cases. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Oh so was this due to a correctness bug not performance that led you to this? It wasn't very clear from the JIRA description.
ConcurrentHashMap is better because RWL locks globally whenever there is a mutation whereas ConcurrentHashMap can be more granular. |
yes, correctness issue: we are getting stale/incorrect reads from observer nodes (fileLen=0 for non-zero bytes files which has been closed by the same FileSystem object). We finally narrowed it down the RC: we are using two different dfs clients/ObserverReadProxyProviders, though the key (scheme/authority) to InnerCache.map.get() is the same. Then, we compared trunk and our internal version. We found in our internal version (based off 2.10.0), we don't have the lock to prevent concurrent updates to InnerCache.map. |
I see. What you're saying is, your internal version has the RWL code removed and you are seeing a correctness issue. You're NOT saying the upstream version has a correctness issue, correct? From my cursory look, I think the existing code is correct in a concurrent setting. If that's the case, my original comment basically stands. From the community standpoint, it would not be fixing a correctness issue but rather improving implementations and/or performance. While I think the new version is probably a better one, it would be good to see how both versions behave in a pretty high-concurrency situation. It shouldn't be too difficult to test quickly. WDYT? |
Correct, our internal version simply did not have that PR, which added the lock. and it is our internal version which has the correctness issue. Trunk version is good. We are not trying to improve the performance here. The concurrentHashMap is a cleaner approach than the old lock based approach. You already mentioned concurrentHashMap provides finer-grain concurrency and should improve performance. So, it is a win-win in both code-readable/maintainability and performance. tbh, I did not want to spend time benchmarking RWLock vs concurrentHashMap. Both are standard constructs that we are supposed to use. This is client-side code: most of the overhead will be from communicating with NN/DN, vs competing for locks here. |
💔 -1 overall
This message was automatically generated. |
+1 from me. Can you also update the JIRA description to match what it is? Since I haven't committed PRs in a long while, it would take me a long time to get familiar with the steps. @ctrezzo, can you please review and commit if you approve? Thanks. |
Added more details to jira ticket. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 It looks good to me as well. Thank you for the patch @xinglin !
Description of PR
HADOOP-19346. ViewFileSystem.InnerCache: Replaced ReentrantReadWriteLock with ConcurrentHashMap/putIfAbsent()
How was this patch tested?
mvn package -Pdist -PskipTests
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?