Add Buffer debloating blog post #524

akalash · 2022-04-05T08:54:28Z

No description provided.

smattheis

Section 1 (What is the article about?)

"instabilities", do you mean "imbalances" or "varying conditions" (instabilities refer rather to failures than to stable conditions)
why "minimum possible overhead", do you mean "minimum latency" (typical duo is "maximum throughput" and/vs "minimum latency")
simplify example: instead of record1, record2 etc. It's not clear to me what the example should illustrate.
What is the problem that buffer debloating tries to solve? (Also it is not mentioned that "buffer debloating" refers to network buffers and how they can bloat.)

Section 2

For Java non-networking/Netty affiliated people, make clear that the term buffer is here used as a data structure to store records but also as the unit that is transfered via network.
I don't understand "network buffer is a good tread off". You mean trade-off ... a buffer cannot be a trade-off.

Section 3 (What else can go wrong?)

Here, you describe the actual problem. This is too late. The problem (on some abstract level) should be described in the summary at the top. Up to here, I actually don't know what it's all about. Then, step by step break it down and decsribe aspects like the relation to backpressure. At the finest level, describe the concrete mechanics as an example but explain on an abstract level. This makes it possible to understand. (Example: The problem is X. The root cause of X is Y and how it interacts with Z. For example, as illustrated, if y1 and y2 happen, z1 and z2 cause y3 ... and finally X.)

Section 4 & 5

Seems okay so far. Maybe some details which we can address later.

General:

Remove title "What is this article about?"
The text is very verbose. It should be more concise.
Add spaces before parentheses
Don't use "you" form if not really necessary and in this text "you" is not necessary (instead speak about the application or the system not about the reader "you")
Also don't switch between forms (I vs. you), avoid both

akalash · 2022-04-12T15:17:56Z

"instabilities", do you mean "imbalances" or "varying conditions" (instabilities refer rather to failures than to stable conditions)

Let's discuss it later but I think that instabilities is more correct word since I mean situations when conditions/environment are not stable(for example, it can be related to losing constant losing and establishing network connections)

why "minimum possible overhead", do you mean "minimum latency" (typical duo is "maximum throughput" and/vs "minimum latency")

latency in general is not so important here. The latency for aligned checkpoints and memory usage for unaligned checkpoints are important. So I try to find somehow the correct word for both of these cases. Perhaps, something like "maximum throughput" vs "minimum effective memory"( or "memory overhead")

For Java non-networking/Netty affiliated people, make clear that the term buffer is here used as a data structure to store records but also as the unit that is transfered via network.

It is exactly what is described in Network buffer, isn't it? -- Flink gather records into the network buffer which is the smallest unit of sending data to subtasks. Perhaps, it makes sense to rephrase it but I think the explanation is exactly as you described.

smattheis · 2022-04-19T09:55:18Z

With instabilities you're in fact right because the parameters are not stable. Nevertheless, I find it confusing because instability usually refers to some "collapse" scenario which is not the main aspect here, right? I find "varying conditions" a bit more precise here because varying conditions require to adapt which buffer debloating tries to approach.

"gather records" and "smallest unit of sending data" is a bit confusing/non-technical. For example, Flink buffers outbound records into so called network buffers which are, roughly speaking, byte arrays of limited size and represent the smallest unit of data that is then sent via network.

smattheis

It like it and the structure looks really good. I have some comments below and I think the rest is then a bit tuning formulations. :)

Summary

Looks good.

Network stack

Looks good.

Buffer debloating

Introduce the term "buffer debloating" earlier in this section, e.g., after the explanation about "time instead of amount of data".
Question here: For the case of backpressure and no backpressure, what is exactly meant? Is it backpressure to the upstream task or from downstream task? I assume the first, but I would suggest to make this explicit here.

How does it work?

I really like that you open up the section from a top level perspective: two parameters, current implementation ajusts via buffer size only.
Is it important to note that, in theory, one could adjust both parameters? Or is there a conceptual reason to adjust only buffer size? If not, I would remove that sentence.
What I would add here is the implementation detail that it doesn't change the actual buffer size but limits the usage of each buffer to some adaptable maximum. As a consequence, it does not reduce the actual heap memory usage but reduces the amount of data and, hence, comes with the mentioned benefits.
Maybe introduce numbered lists to describe steps, e.g.:
1. Consistently calculate throughput.
2. Calculate buffer size
3. Adapt smoothly.
I would remove the last sentence "It is why some specifc cases can be ...". I understand that you introduce the next section but it's not necessary. The headline of the next section does that.

Usage

Looks good but why is 'Multiple inputs and unions' section from the linked manual missing?

akalash · 2022-04-29T16:23:05Z

Question here: For the case of backpressure and no backpressure, what is exactly meant? Is it backpressure to the upstream task or from downstream task? I assume the first, but I would suggest to make this explicit here.

I didn't get the question. It is backpressure to the upstream task from downstream task(there is no any or there). So it means that the current task can not proceed due to output buffers being full. Maybe I need to clarify something but I don't really understand what exactly.

Is it important to note that, in theory, one could adjust both parameters? Or is there a conceptual reason to adjust only buffer size? If not, I would remove that sentence.

I think it is important to know that since if the user know about two parameters(buffer size, buffer numbers) it is obvious question what and how buffer debloating change. Ideally, we should change both these parameters but by simplicity reason(at least as first implementation) we decided to change only buffer size. Perhaps, we can just simplify this sentence like Currently, the buffer debloating manages the memory usage by adjusting only the size of the network buffer while the number of network buffers remains always the constant

Looks good but why is 'Multiple inputs and unions' section from the linked manual missing?

We actually already implemented that so we should fix the original documentation instead. at least we need to check what was implemented already. I will do that.

smattheis · 2022-05-11T14:41:26Z

To checkpointing:

Question here: For the case of backpressure and no backpressure, what is exactly meant? Is it backpressure to the upstream task or from downstream task? I assume the first, but I would suggest to make this explicit here.

I didn't get the question. It is backpressure to the upstream task from downstream task(there is no any or there). So it means that the current task can not proceed due to output buffers being full. Maybe I need to clarify something but I don't really understand what exactly.

You're right I find my question confusing myself. What I actually mean is, why do you distinguish here backpressure and no backpressure scenarios? That's not so intuitive to me because the benefits could theoretically (I admit only theoretically and probably never in practice) be visible also if there is no backpressure, i.e., if input buffers have high fill level whilst never being full with no backpressure such that buffer debloating limits data in the buffers and reduces checkpoint time and checkpoint size likewise. Nevertheless, if I get your idea right I would suggest to have a little transition like and make your intuition explicit, e.g.:

The benefits of buffer debloating are best visible when backpressure occurs. The reason is that no backpressure means that the downstream task is processing input faster than (or equally fast as) data arrives such that its input buffer is barely filled and, consequently, buffer debloating has almost no visible effect.

This is the opposite when backpressure occurs as it means input buffers are frequently filled. Since buffer debloating limits the amount of data in the input buffer, we observe the following benefits with regard to checkpointing:

...

To buffer debloating mechanism:

I think it is important to know that since if the user know about two parameters(buffer size, buffer numbers) it is obvious question what and how buffer debloating change. Ideally, we should change both these parameters but by simplicity reason(at least as first implementation) we decided to change only buffer size.

I would add a side phrase with exactly that.

For simplicity, buffer debloating currently only caps the maximal used buffer size ...

and at the end of the paragraph mention

Nevertheless, the benefits of buffer debloating with regard to checkpointing are effective as described before.

Other comments:

I think you accidentally reverted some changes, e.g., in the summary: 2f5be59

smattheis

I think remaining are some typos and grammar issues. Besides that, we need few corrections in the GIFs:

buffer-debloat.gif: Keep the arrows with buffer size update longer because they disappear too fast.
checkpoint-barrier.png: It's a static image. Is that as you wanted it?
simple_problem.gif is currently included twice in img/blog/ and second in content/img/blog. Also the label "Network/Local connection" oscillates with and without a linebreak. And the label "Waiting for network" is not at the same height for the subtasks.

MartijnVisser · 2023-02-24T08:22:54Z

@akalash Didn't you want to ultimately complete this blog post?

akalash · 2023-02-24T08:32:42Z

@akalash Didn't you want to ultimately complete this blog post?

Oh, it is so a good point. I surely have to find time for this.

Add Buffer debloating blog post

9750d85

smattheis suggested changes Apr 8, 2022

View reviewed changes

akalash added 2 commits April 20, 2022 18:15

Problem explanation changed

4fb6076

Reorganized paragraphs

2f5be59

smattheis reviewed Apr 27, 2022

View reviewed changes

Buffer debloating section corrections

423fafa

Backpressure line are cut

496c6d3

smattheis reviewed May 20, 2022

View reviewed changes

images has been updated

22971de

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Buffer debloating blog post #524

Add Buffer debloating blog post #524

akalash commented Apr 5, 2022

smattheis left a comment

akalash commented Apr 12, 2022

smattheis commented Apr 19, 2022

smattheis left a comment •

edited

Loading

akalash commented Apr 29, 2022

smattheis commented May 11, 2022

smattheis left a comment •

edited

Loading

MartijnVisser commented Feb 24, 2023

akalash commented Feb 24, 2023

Add Buffer debloating blog post #524

Are you sure you want to change the base?

Add Buffer debloating blog post #524

Conversation

akalash commented Apr 5, 2022

smattheis left a comment

Choose a reason for hiding this comment

Section 1 (What is the article about?)

Section 2

Section 3 (What else can go wrong?)

Section 4 & 5

General:

akalash commented Apr 12, 2022

smattheis commented Apr 19, 2022

smattheis left a comment • edited Loading

Choose a reason for hiding this comment

Summary

Network stack

Buffer debloating

How does it work?

Usage

akalash commented Apr 29, 2022

smattheis commented May 11, 2022

To checkpointing:

To buffer debloating mechanism:

Other comments:

smattheis left a comment • edited Loading

Choose a reason for hiding this comment

MartijnVisser commented Feb 24, 2023

akalash commented Feb 24, 2023

smattheis left a comment •

edited

Loading

smattheis left a comment •

edited

Loading