Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-28513 The StochasticLoadBalancer should support discrete evaluations #6543

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rmdmattingly
Copy link
Contributor

@rmdmattingly rmdmattingly commented Dec 13, 2024

See my design doc here

To sum it up, the current load balancer isn't great for what it's supposed to do now, and it won't support all of the things that we'd like it to do in a perfect world.

Right now: primary replica balancing squashes all other considerations. The default weight for one of the several cost functions that factor into primary replica balancing is 100,000. Meanwhile the default read request cost is 5. The result is that the load balancer, OOTB, basically doesn't care about balancing actual load. To solve this, you can either set primary replica balancing costs to zero, which is fine if you don't use read replicas, or — if you do use read replicas — maybe you can produce a magic incantation of configurations that work just right, until your needs change.

In the future: we'd like a lot more out of the balancer. System table isolation, meta table isolation, colocation of regions based on start key prefix similarity (this is a very rough idea atm, and not touched in the scope of this PR). And to support all of these features with either cost functions or RS groups would be a real burden. I think what I'm proposing here will be a much, much easier path for HBase operators.

New features

This PR introduces some new features:

  1. Balancer conditional based replica distribution
  2. System table isolation (put backups, quotas, etc on their own RegionServer (all sys tables on 1))
  3. Meta table isolation (put meta on its own RegionServer)

These can be controlled via:

  • hbase.master.balancer.stochastic.conditionals.distributeReplicas: set this to true to enable conditional based replica distribution
  • hbase.master.balancer.stochastic.conditionals.isolateSystemTables: set this to true to enable system table isolation
  • hbase.master.balancer.stochastic.conditionals.isolateMetaTable: set this to true to enable meta table isolation
  • hbase.master.balancer.stochastic.additionalConditionals: much like cost functions, you can define your own RegionPlanConditional implementation and install them here

Testing

I wrote a lot of unit tests to validate the functionality here — both lightweight and some minicluster tests. Even in the most extreme cases (like, system table isolation + meta table isolation enabled on a 3 node cluster, or the number of read replicas == the number of servers) the balancer does what we'd expect.

Replica Distribution Improvements

Not only does this PR offer an alternative means of distributing replicas, but it's actually a massive improvement on the existing approach.

See the Replica Distribution testing section of my design doc. Cost functions never successfully balance 3 replicas across 3 servers OOTB — but balancer conditionals do so expeditiously.

To summarize the testing, we have replicated_table, a table with 3 region replicas. The 3 regions of a given replica share a color, and there are also 3 RegionServers in the cluster. We expect the balancer to evenly distribute one replica per server across the 3 RegionServers...

Cost functions don't work:
cf1
cf2

….omitting the meaningless snapshots between 4 and 27…

cf28

At this point, I just exited the test because it was clear that our existing balancer would never achieve true replica distribution.

But balancer conditionals do work:
bc1
bc2
bc3
bc4
bc5

New Features: Table Isolation Working as Designed

See below where we ran a new unit test, TestLargerClusterBalancerConditionals, and tracked the locations of regions for 3 tables across 18 RegionServers:

  1. 180 “product” table regions
  2. 1 meta table region
  3. 1 quotas table region

All regions began on a single RegionServer, and within 4 balancer iterations we had a well balanced cluster, and isolation of key system tables. It achieved this in about 2min on my local machine, where most of that time was spent bootstrapping the mini cluster.

output (2)

output (3)

output (5)

output (4)

cc @ndimiduk @charlesconnell @ksravista @aalhour

@rmdmattingly rmdmattingly force-pushed the HBASE-28513 branch 5 times, most recently from e1283f4 to 517c43b Compare December 13, 2024 23:40
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@rmdmattingly
Copy link
Contributor Author

rmdmattingly commented Dec 14, 2024

Still cleaning this up with the help of the build logs. Will mark as a draft for now. I believe the code is working quite well though, so please feel free to review the proposal and meat of the changes

I'm still deciding whether it's necessary to create a balancer candidate for the replica conditional.

@rmdmattingly rmdmattingly marked this pull request as draft December 14, 2024 20:14
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@rmdmattingly
Copy link
Contributor Author

rmdmattingly commented Dec 15, 2024

This is working really well in my testing, and I'm not convinced that it's necessary to add a replica distribution candidate generator. This is because, typically, each region replica has so many acceptable destinations (n-r+1, where n is the number of servers and r is the number of replicas), and so many acceptable swap candidates (any region who does not represent the same data). This is different from, say, a table isolation conditional where we really want to drain many virtually all regions from a single RegionServer, and no swaps are appropriate

This is probably work for a separate PR, but I think it would be nice to support pluggable candidate generators to pair with any custom conditionals that users write

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@rmdmattingly rmdmattingly force-pushed the HBASE-28513 branch 2 times, most recently from ae58410 to d1622d1 Compare December 16, 2024 02:21
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

.flatMap(Optional::stream).forEach(RegionPlanConditionalCandidateGenerator::clearWeightCache);
}

void loadConf(Configuration conf) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can have BalancerConditionals implement Configurable or BaseConfigurable to do this in a more consistent way

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@rmdmattingly rmdmattingly force-pushed the HBASE-28513 branch 2 times, most recently from 8ac0c7a to 2ca6c63 Compare December 31, 2024 15:22
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@rmdmattingly rmdmattingly force-pushed the HBASE-28513 branch 3 times, most recently from e94ba85 to 1cd2c34 Compare January 4, 2025 02:29
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@rmdmattingly
Copy link
Contributor Author

rmdmattingly commented Jan 4, 2025

Build looks good here, and all of the balancer tests reliably pass on my machine within about 30min. I've improved the runtime of several tests too, because we previously made a lot of quick assumptions about the appropriate balancer runtime being 30s here, 60s there, and those really add up when running the full test suite repeatedly.

I've also setup a large cluster test for conditional replica balancing, at an identical scale to the existing large cluster test for legacy replica balancing. It demonstrates a significant improvement in balancer latency when dealing with 1k servers, 20k regions, 3 replicas per region, and 100 tables:
Screenshot 2025-01-04 at 11 48 55 AM

Because this PR is huge, and there is a lot of iteration along the way, I'm tempted to close this and reopen a clean PR with a passing build. I'll do that either this weekend, or early next week.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 37s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+0 🆗 mvndep 0m 10s Maven dependency ordering for branch
+1 💚 mvninstall 4m 4s master passed
+1 💚 compile 4m 19s master passed
+1 💚 checkstyle 0m 57s master passed
+1 💚 spotbugs 2m 16s master passed
+1 💚 spotless 0m 56s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 11s Maven dependency ordering for patch
+1 💚 mvninstall 3m 52s the patch passed
+1 💚 compile 4m 13s the patch passed
+1 💚 javac 0m 29s hbase-balancer generated 0 new + 66 unchanged - 4 fixed = 66 total (was 70)
+1 💚 javac 3m 44s hbase-server in the patch passed.
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 57s the patch passed
+1 💚 spotbugs 2m 30s the patch passed
+1 💚 hadoopcheck 13m 40s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 1m 14s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 35s The patch does not generate ASF License warnings.
49m 17s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6543/21/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6543
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 80abf62311b3 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / a689f06
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 84 (vs. ulimit of 30000)
modules C: hbase-balancer hbase-server U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6543/21/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 41s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for branch
+1 💚 mvninstall 4m 7s master passed
+1 💚 compile 1m 30s master passed
+1 💚 javadoc 0m 45s master passed
+1 💚 shadedjars 7m 1s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 12s Maven dependency ordering for patch
+1 💚 mvninstall 3m 56s the patch passed
+1 💚 compile 1m 25s the patch passed
+1 💚 javac 1m 25s the patch passed
+1 💚 javadoc 0m 42s the patch passed
+1 💚 shadedjars 6m 38s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 14m 52s hbase-balancer in the patch passed.
+1 💚 unit 185m 39s hbase-server in the patch passed.
232m 50s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6543/21/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6543
Optional Tests javac javadoc unit compile shadedjars
uname Linux 9816d81d30da 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / a689f06
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6543/21/testReport/
Max. process+thread count 5313 (vs. ulimit of 30000)
modules C: hbase-balancer hbase-server U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6543/21/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants