-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YARN-11387. [GPG] YARN GPG mistakenly deleted applicationid. #6660
base: trunk
Are you sure you want to change the base?
Conversation
🎊 +1 overall
This message was automatically generated. |
@@ -46,47 +45,38 @@ public void run() { | |||
LOG.info("Application cleaner run at time {}", now); | |||
|
|||
FederationStateStoreFacade facade = getGPGContext().getStateStoreFacade(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Step 1: Retrieve all applications stored in the StateStore, which represents all applications submitted to the Router.
Step 2: Use the Router's REST API to fetch all running tasks. This API will invoke applications from all active SubClusters.
Step 3: Compare the results of Step1 and Step2 to identify applications that exist in Step1 but not in Step2. Delete these applications.
There is a potential issue with this approach. If a particular SubCluster is undergoing maintenance, such as RM restart, Step2 will not be able to fetch the complete list of running applications. As a result, during the comparison in Step3, there is a risk of mistakenly deleting applications that are still running.
We have three SubClusters: subClusterA, subClusterB, and subClusterC, with an equal allocation ratio of 1:1:1.
We submit six applications through routerA.
app1 and app2 are allocated to subClusterA
app3 and app4 to subClusterB
app5 and app6 to subClusterC.
Among these, app1, app3, and app5 have completed their execution, and we expect to retain app2, app4, and app6 in the StateStore.
In the normal scenario:
Comparing the steps mentioned above:
Step 1: We will retrieve six applications [app1, app2, app3, app4, app5, app6] from the StateStore.
Step 2: We will fetch three applications [app2, app4, app6] from the Router's REST interface.
Step 3: By comparing Step 1 and Step 2, we can identify that applications [app1, app3, app5] should be deleted.
In the exceptional scenario:
Comparing the steps mentioned above:
Step 1: We will retrieve six applications [app1, app2, app3, app4, app5, app6] from the StateStore.
Step 2: We will fetch the list of running applications from the Router's REST interface. However, due to maintenance in subClusterB and subClusterC, we can only obtain the applications running in subClusterA [app2].
Step 3: By comparing Step 1 and Step 3, we can identify that applications [app1, app3, app4, app5, app6] should be deleted.
In this case, we had an error deletion.
@goiri Can you help review this PR? Thank you very much! |
🎊 +1 overall
This message was automatically generated. |
@goiri Can you help review this PR? Thank you very much! |
🎊 +1 overall
This message was automatically generated. |
LGTM. |
Description of PR
JIRA: YARN-11387. [GPG] YARN GPG mistakenly deleted applicationid.
How was this patch tested?
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?