-
Notifications
You must be signed in to change notification settings - Fork 861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can the api give a clear warning message if miss config for BatchLogRecordProcessorBuilder's maxQueueSize and maxExportBatchSize? #6454
Comments
Additional: If the users miss configue maxQueueSize < maxExportBatchSize, besides give a warning message, can we set maxExportBatchSize=maxQueueSize , so that can ensure the logs not lost? |
@tongshushan Are you able to put in a PR to address this? |
I have similar question to @breedx-splk's #7024 (comment), I'm not clear why maxQueueSize must be greater than maxExportBatchSize to ensure data loss doesn't occur? at the same time, I agree that I probably would recommend configuring maxQueueSize >= maxExportBatchSize, I'd just like to be clear whether we're recommending this as a must to avoid data loss, or as a recommendation / best practice |
These lines of code are the problem:
Together, they collaborate to mean that the worker thread is never notified that its time to export based on the queue filling up. Instead, it always has to wait for next export time based on And so as the issue poster points out, the seemingly benign mistake of setting We don't necessarily have to throw an exception when the user misconfigures like this, but we need to fix the behavior so that when the queue fills up, the worker is properly signaled to perform an export. |
oh, yikes! I totally missed the wait/notify, I was thinking the queue was continuously drained (but I like the cleverness of limiting the context switching 👍) |
Heyy thanks folks for chiming in, I did some leg work to see how other languages are doing on this matter, please let me know if that helps and how you'd like to proceed on this |
Thanks for that @chukunx. I think opentelemetry-java's behavior I described here is a bug. Options to fix:
Option 1 represents a more lenient approach. We accept the invalid config and essentially ignore it, since Option 2 is more rigid, representing the fail fast mentality. We generally fail fast in this repo, although this is a bit of a special case because nothing is actually broken (after we fix the bug) when |
Glad that helped @jack-berg! Option 1 sounds better from backward compatibility point of view as well. For implementation I can see two approaches: a. Add an additional check to this condition opentelemetry-java/sdk/trace/src/main/java/io/opentelemetry/sdk/trace/export/BatchSpanProcessor.java Line 257 in cb64451
to be something like if (batch.size() >= maxExportBatchSize || batch.size() >= maxQueueSize || System.nanoTime() >= nextExportTime) which effectively means b. Comparing the two values at the builder and adjusting their values so that the max batch size does not exceeds the max queue size when creating the processor. The Go implementation is a good one to borrow in my opinion. if maxExportBatchSize > maxQueueSize {
if DefaultMaxExportBatchSize > maxQueueSize {
maxExportBatchSize = maxQueueSize
} else {
maxExportBatchSize = DefaultMaxExportBatchSize
}
} Do you have preference over the two approaches? |
Yes data loss can happen with the batch processor. This is necessary to protect an application from unbounded resource utilization. Users can detect data loss and reconfigure (turn off instrumentation, reduce sampling rate, increase batch processor queue size) by looking at the
I would also want to double check that the batch and spansNeeded fields are being sized / set appropriately, since they are involved in signalling as well.
This is easier to reason about and implement IMO. I think there is a minor semantic difference between the two approaches. The first allows a situation where the worker thread is triggered by the max queue size being reached, but a export batch size bigger than the max queue size since spans may continue to flow and be added to the queue as it is being drained. But I think we should probably ignore this edge case and opt for this simpler solution unless needed. |
Hello,
For BatchLogRecordProcessorBuilder configurations, if miss configue maxQueueSize < maxExportBatchSize, can the api give a clear warning message? At present it's no hint message, and the logs will be lost.
io.opentelemetry: 1.37.0
related link:
#6443
Thanks.
The text was updated successfully, but these errors were encountered: