-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-19093. [ABFS] Improve rate limiting through ABFS in Manifest Committer #6596
HADOOP-19093. [ABFS] Improve rate limiting through ABFS in Manifest Committer #6596
Conversation
💔 -1 overall
This message was automatically generated. |
@@ -489,6 +489,7 @@ protected Path getUniquePath(String filepath) { | |||
.right(UUID.randomUUID().toString(), SHORTENED_GUID_LEN)); | |||
} | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cut
@@ -545,7 +545,8 @@ private void resetStatisticsContext() { | |||
* @param futures futures. | |||
* @param sleepInterval Interval in milliseconds to await completion. | |||
*/ | |||
private static void waitFor(Collection<Future<?>> futures, int sleepInterval) { | |||
@InterfaceAudience.Private | |||
public static void waitFor(Collection<Future<?>> futures, int sleepInterval) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or move to FutureIO and cross reference from here. The cleaner solution
When I'm going to add rate limiting for all IO within the committer. Currently there's a rate limiter for the committer rename -in the FS instance, so shared across all threads- but nothing else. my plan is in hadoop common to add a new interface |
e250fff
to
bfd6a46
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
testing on an overloaded azure store; hit sime timeouts and an assertion failure in |
Latest PR adds op-specific capacity multiplication to abfs store; numbers are pretty arbitrary as this is only used in the committer. could be tuned in future. |
💔 -1 overall
This message was automatically generated. |
uses wait/notify to try and block threads; doesn't seem to do work. Change-Id: I2c29968dc666c8e9e18bc59ea83ce59cea70bf52
Don't seem able to generate enough load to force the problem Change-Id: I35af099611dcc0dc90719d0abc951935c9f3d930
* Hadoop common to add an API RateLimiterSource. * ABFS exposes its existing rate limiter through the API. * Manifest committer asks for IO capacity on all IO operations * new option `mapreduce.manifest.committer.cleanup.directory.write.capacity`. number of write IOPS to request for every dir deletion. * azure storage `fs.azure.io.rate.limit` cut back to 1000. Goal: reduce risk of overload. * Task attempt commit will reattempt to save and rename manifest multiple times (3) so that race conditions in TA commit are more likely to be recovered from. Change-Id: I2038d9b8c1885c4d49c6b191a5b5f4e59779ef04
Rate limiter API is now IORateLimiter Method: Duration acquireIOCapacity(String operation, int requestedCapacity) +static constructors for unlimited and restricted. +tests This is implemented in AbfsStore, exported via ABFS Filesystem. Manifest committer has values for different operations to ask for (just made up) and a configured value for directory delete, which gets passed down from the config. Fairly complex work splitting current deletion code up to deleteFile and deleteDir Change-Id: I3b138974ec81a3120a14bfede4b2535a6abb20fe
* Move IORateLimiter implementation support from public interface to org.apache.hadoop.fs.impl.IORateLimiterSupport * abfs store has multiplier for higher cost operations (list, rename,...) (not delete) * tests that abfs multiplier works all the way through ManifestStoreOperations. * Because the multiplication is done in abfs, the ManifestCommitter itself set the capacity of all single file operations it wants to "1". Change-Id: I8a211318ce26b49f6339ba67a52f611839ae0099
Given up trying to make the load test work, instead created it to a scale test. If it ever does now trigger failure, the output should be insightful. Change-Id: I15e78c7006a99661d62113c253679a27cbf914f8
b3092b6
to
70fe8c4
Compare
I am having fun testing at scale with a throttled store.
will need to isolate |
💔 -1 overall
This message was automatically generated. |
I intend to pull the hadoop-common side of things out to be a self contained patch Change-Id: I18e32907abccdfa30c28ba5e1f2e15ae58012f69
💔 -1 overall
This message was automatically generated. |
Change-Id: I24169c97ef11e78b93d0107165027b4ea3d08b2c
Adds an API (pulled from apache#6596) to allow callers to request IO capacity for an named operation with optional source and dest paths. Change-Id: I02aff4d3c90ac299c80f388e88195d69e1049fe0
Adds an API (pulled from apache#6596) to allow callers to request IO capacity for an named operation with optional source and dest paths. Change-Id: I02aff4d3c90ac299c80f388e88195d69e1049fe0
💔 -1 overall
This message was automatically generated. |
Adds an API (pulled from apache#6596) to allow callers to request IO capacity for an named operation with optional source and dest paths. Change-Id: I02aff4d3c90ac299c80f388e88195d69e1049fe0
Description of PR
HADOOP-19093. Improve rate limiting through ABFS in Manifest Committer
Adds a hadoop common API
IORateLimiter
with a method to acquire IO capacity for a specific operation.Because the operation is passed down using standard names, the clients can map the requested "client capacity" to its actual IO capacity (limited to the specific IO instances).
As examples:
This API is intended for Applications who really want to limit their load across bulk operations; otherwise it can be hidden within the actual FileSystem clients. And as it will be restricted to the single process, only effective for reducing the load of those bulk operations.
For S3A I'd use it in the new bulk delete call of #6494
new api, so no regression
avoids aggressive multithreaded table compaction overloading an s3 classic bucket
we can document it from the outset "you will be rate limited".
Adds new test which can be explicitly launched to try and trigger rename throttling. doesn't work as you do need to generate a lot of load, and unlike the s3 bulk delete 1K paths "feature", you cannot generate enough load from a single machine.
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?