logproto: fixed state handling of the new internal handshake_in_progress flag #388

HofiOne · 2024-11-22T20:51:16Z

There were multiple issues here

proto change did not force re-handshake if needed
initial state handling led to a drop of the first flush event both on the client and server-side that e.g. led to incorrect persist state processing at restart

…ess flag There were multiple issues here - proto change did not force re-handshake if needed - initial state handling led to a drop of the first flush event both on client and server side that e.g. led to incorrect persist state processing at restart Signed-off-by: Hofi <[email protected]>

bazsi · 2024-12-01T13:43:56Z

lib/logproto/logproto-client.h

-    {
-      return s->handshake(s, handshake_finished);
-    }
+  if (log_proto_client_needs_handshake(s))


all our method wrappers check the methods directly and not via another helper function. are there any other uses for log_proto_client_needs_handshake()?

it is an inline func, no penalities at all

it can be a more complex condition for a given proto client, and this can be an overriden impl later if needed

yes, it is reused and called from other places

bazsi · 2024-12-01T13:45:14Z

lib/logreader.c

-    log_proto_server_set_wakeup_cb(self->proto, (LogProtoServerWakeupFunc) log_reader_wakeup, self);
+    {
+      log_proto_server_set_wakeup_cb(self->proto, (LogProtoServerWakeupFunc) log_reader_wakeup, self);
+      self->handshake_in_progress = log_proto_server_needs_handshake(self->proto);


why do we need to reset handshake_in_progress? the idea was to always start in "handshake" mode and the handshake method returns when handshake finishes. we should not need to check that again, at least in my opinion.

I've just checked, handshake_finished is only set to TRUE if either the LogProto claims it has finished the handshake, or when we switch the a LogProto that does not have a handshake method.

bazsi · 2024-12-01T13:53:30Z

lib/logreader.c

-      return log_reader_process_handshake(self);
+      gboolean succ = log_reader_process_handshake(self);
+      if (FALSE == succ || self->handshake_in_progress)
+        return FALSE;


I don't think this is needed.

This what is really needed here.
This check must not return true ever.
If the handshake is needed and it fails or not completed yet, it should return false.
If the handshake is not needed or it is finised fine already, the flow should go forward and finish the flush operation, otherwise this flush will be lost (as the caller might think its flush request is processed and will not attempt it once again).
The latter can be experienced currently, just try my example bellow.

bazsi · 2024-12-01T14:05:00Z

proto change did not force re-handshake if needed

I don't think this is needed. the whole idea is to start in handshake mode and bail out if the LogProto says it's over, or if we are changing the LogProto instance that does not have a handshake method.

initial state handling led to a drop of the first flush event both on client and server side that e.g. led to incorrect persist state processing at restart

Can you elaborate on this?

HofiOne · 2024-12-02T09:33:04Z

proto change did not force re-handshake if needed

I don't think this is needed. the whole idea is to start in handshake mode and bail out if the LogProto says it's over, or if we are changing the LogProto instance that does not have a handshake method.

I do not add my comments one by one to all of the above, just answering here if you do not mind.
In my opinion

proto change might require a repeated handshake anytime, e.g. if both the old and the new proto pairs require it, if it wants to be a generic solution it should be prepared for this case as well
forcibly starting in handshake mode, as the new implementation does, slightly changes the original behavior, the workflow would not be the same now

initial state handling led to a drop of the first flush event both on client and server side that e.g. led to incorrect persist state processing at restart

Can you elaborate on this?

as the result of the second point above, and as the result of the unhandled case in the log_writer_flush when the handshake is finished perfectly immediately (at the first check), or when it is not needed at all (no handler function is set up) will lead to a lost flush

steps to reproduce

create a file source and file destination pair
before starting AxoSyslog create the input of the file source e.g. with 1000 log times (the big number of logs is needed just to be able to reproduce the issue manually easily)
start AxoSyslog (in the foreground for easier debugging)
interrupt it after a few hundred, but before all of the logs are processed, e.g. send it a SIGKILL (or simply press ctrl+c, it does not matter)
restart AxoSyslog

Instead of processing the remaining part of the original 1000 logs from the input file, it will simply drop/ignore them because of the changed log_writer_flush implementation, as the handshake evaluation will return the first time immediately which would lead to lost messages.

bazsi · 2024-12-03T07:51:07Z

proto change did not force re-handshake if needed

I don't think this is needed. the whole idea is to start in handshake mode and bail out if the LogProto says it's over, or if we are changing the LogProto instance that does not have a handshake method.

I do not add my comments one by one to all of the above, just answering here if you do not mind. In my opinion
* proto change might require a repeated handshake anytime, e.g. if both the old and the new proto pairs require it, if it wants to be a generic solution it should be prepared for this case as well

Only the handshake function can replace the proto, so it cannot happen any time. At this point we are still in handshake mode, which is terminated by the new LogProto by setting handshake_finished to TRUE.

* forcibly starting in handshake mode, as the new implementation does, slightly changes the original behavior, the workflow would not be the same now
initial state handling led to a drop of the first flush event both on client and server side that e.g. led to incorrect persist state processing at restart

Can you elaborate on this?

as the result of the second point above, and as the result of the unhandled case in the log_writer_flush when the handshake is finished perfectly immediately (at the first check), or when it is not needed at all (no handler function is set up) will lead to a lost flush

so this is a LogWriter side issue, not a LogReader one. Do you think it applies to LogReader as well?

steps to reproduce
1. create a file source and file destination pair

2. before starting AxoSyslog create the input of the file source e.g. with 1000 log times (the big number of logs is needed just to be able to reproduce the issue manually easily)

3. start AxoSyslog (in the foreground for easier debugging)

4. interrupt it after a few hundred, but before all of the logs are processed, e.g. send it a SIGKILL (or simply press ctrl+c, it does not matter)

5. restart AxoSyslog
Instead of processing the remaining part of the original 1000 logs from the input file, it will simply drop/ignore them because of the changed log_writer_flush implementation, as the handshake evaluation will return the first time immediately which would lead to lost messages.

will try this.

MrAnno · 2024-12-10T11:46:26Z

Fixed by #407

HofiOne force-pushed the fix-logproto-remove-handshake-in-progress branch from f83b6b1 to 61b2abb Compare November 25, 2024 18:59

bazsi reviewed Dec 1, 2024

View reviewed changes

bazsi mentioned this pull request Dec 3, 2024

Fix handshake regression #407

Merged

MrAnno closed this Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

logproto: fixed state handling of the new internal handshake_in_progress flag #388

logproto: fixed state handling of the new internal handshake_in_progress flag #388

HofiOne commented Nov 22, 2024

bazsi Dec 1, 2024

HofiOne Dec 3, 2024

bazsi Dec 1, 2024

bazsi Dec 1, 2024

bazsi Dec 1, 2024

HofiOne Dec 3, 2024

bazsi commented Dec 1, 2024

HofiOne commented Dec 2, 2024 •

edited

Loading

bazsi commented Dec 3, 2024

MrAnno commented Dec 10, 2024

logproto: fixed state handling of the new internal handshake_in_progress flag #388

logproto: fixed state handling of the new internal handshake_in_progress flag #388

Conversation

HofiOne commented Nov 22, 2024

bazsi Dec 1, 2024

Choose a reason for hiding this comment

HofiOne Dec 3, 2024

Choose a reason for hiding this comment

bazsi Dec 1, 2024

Choose a reason for hiding this comment

bazsi Dec 1, 2024

Choose a reason for hiding this comment

bazsi Dec 1, 2024

Choose a reason for hiding this comment

HofiOne Dec 3, 2024

Choose a reason for hiding this comment

bazsi commented Dec 1, 2024

HofiOne commented Dec 2, 2024 • edited Loading

bazsi commented Dec 3, 2024

MrAnno commented Dec 10, 2024

HofiOne commented Dec 2, 2024 •

edited

Loading