Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logproto: fixed state handling of the new internal handshake_in_progress flag #388

Conversation

HofiOne
Copy link
Contributor

@HofiOne HofiOne commented Nov 22, 2024

There were multiple issues here

  • proto change did not force re-handshake if needed
  • initial state handling led to a drop of the first flush event both on the client and server-side that e.g. led to incorrect persist state processing at restart

Signed-off-by: Hofi [email protected]

…ess flag

There were multiple issues here
- proto change did not force re-handshake if needed
- initial state handling led to a drop of the first flush event both on client and server side that e.g. led to incorrect persist state processing at restart
Signed-off-by: Hofi <[email protected]>
@HofiOne HofiOne force-pushed the fix-logproto-remove-handshake-in-progress branch from f83b6b1 to 61b2abb Compare November 25, 2024 18:59
{
return s->handshake(s, handshake_finished);
}
if (log_proto_client_needs_handshake(s))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all our method wrappers check the methods directly and not via another helper function. are there any other uses for log_proto_client_needs_handshake()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • it is an inline func, no penalities at all
  • it can be a more complex condition for a given proto client, and this can be an overriden impl later if needed
  • yes, it is reused and called from other places

log_proto_server_set_wakeup_cb(self->proto, (LogProtoServerWakeupFunc) log_reader_wakeup, self);
{
log_proto_server_set_wakeup_cb(self->proto, (LogProtoServerWakeupFunc) log_reader_wakeup, self);
self->handshake_in_progress = log_proto_server_needs_handshake(self->proto);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to reset handshake_in_progress? the idea was to always start in "handshake" mode and the handshake method returns when handshake finishes. we should not need to check that again, at least in my opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just checked, handshake_finished is only set to TRUE if either the LogProto claims it has finished the handshake, or when we switch the a LogProto that does not have a handshake method.

return log_reader_process_handshake(self);
gboolean succ = log_reader_process_handshake(self);
if (FALSE == succ || self->handshake_in_progress)
return FALSE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This what is really needed here.
This check must not return true ever.
If the handshake is needed and it fails or not completed yet, it should return false.
If the handshake is not needed or it is finised fine already, the flow should go forward and finish the flush operation, otherwise this flush will be lost (as the caller might think its flush request is processed and will not attempt it once again).
The latter can be experienced currently, just try my example bellow.

@bazsi
Copy link
Member

bazsi commented Dec 1, 2024

  • proto change did not force re-handshake if needed

I don't think this is needed. the whole idea is to start in handshake mode and bail out if the LogProto says it's over, or if we are changing the LogProto instance that does not have a handshake method.

  • initial state handling led to a drop of the first flush event both on client and server side that e.g. led to incorrect persist state processing at restart

Can you elaborate on this?

@HofiOne
Copy link
Contributor Author

HofiOne commented Dec 2, 2024

  • proto change did not force re-handshake if needed

I don't think this is needed. the whole idea is to start in handshake mode and bail out if the LogProto says it's over, or if we are changing the LogProto instance that does not have a handshake method.

I do not add my comments one by one to all of the above, just answering here if you do not mind.
In my opinion

  • proto change might require a repeated handshake anytime, e.g. if both the old and the new proto pairs require it, if it wants to be a generic solution it should be prepared for this case as well
  • forcibly starting in handshake mode, as the new implementation does, slightly changes the original behavior, the workflow would not be the same now
  • initial state handling led to a drop of the first flush event both on client and server side that e.g. led to incorrect persist state processing at restart

Can you elaborate on this?

as the result of the second point above, and as the result of the unhandled case in the log_writer_flush when the handshake is finished perfectly immediately (at the first check), or when it is not needed at all (no handler function is set up) will lead to a lost flush

steps to reproduce

  1. create a file source and file destination pair
  2. before starting AxoSyslog create the input of the file source e.g. with 1000 log times (the big number of logs is needed just to be able to reproduce the issue manually easily)
  3. start AxoSyslog (in the foreground for easier debugging)
  4. interrupt it after a few hundred, but before all of the logs are processed, e.g. send it a SIGKILL (or simply press ctrl+c, it does not matter)
  5. restart AxoSyslog

Instead of processing the remaining part of the original 1000 logs from the input file, it will simply drop/ignore them because of the changed log_writer_flush implementation, as the handshake evaluation will return the first time immediately which would lead to lost messages.

@bazsi
Copy link
Member

bazsi commented Dec 3, 2024

  • proto change did not force re-handshake if needed

I don't think this is needed. the whole idea is to start in handshake mode and bail out if the LogProto says it's over, or if we are changing the LogProto instance that does not have a handshake method.

I do not add my comments one by one to all of the above, just answering here if you do not mind. In my opinion

* proto change might require a repeated handshake anytime, e.g. if both the old and the new proto pairs require it, if it wants to be a generic solution it should be prepared for this case as well

Only the handshake function can replace the proto, so it cannot happen any time. At this point we are still in handshake mode, which is terminated by the new LogProto by setting handshake_finished to TRUE.

* forcibly starting in handshake mode, as the new implementation does, slightly changes the original behavior, the workflow would not be the same now
  • initial state handling led to a drop of the first flush event both on client and server side that e.g. led to incorrect persist state processing at restart

Can you elaborate on this?

as the result of the second point above, and as the result of the unhandled case in the log_writer_flush when the handshake is finished perfectly immediately (at the first check), or when it is not needed at all (no handler function is set up) will lead to a lost flush

so this is a LogWriter side issue, not a LogReader one. Do you think it applies to LogReader as well?

steps to reproduce

1. create a file source and file destination pair

2. before starting AxoSyslog create the input of the file source e.g. with 1000 log times (the big number of logs is needed just to be able to reproduce the issue manually easily)

3. start AxoSyslog (in the foreground for easier debugging)

4. interrupt it after a few hundred, but before all of the logs are processed, e.g. send it a SIGKILL (or simply press ctrl+c, it does not matter)

5. restart AxoSyslog

Instead of processing the remaining part of the original 1000 logs from the input file, it will simply drop/ignore them because of the changed log_writer_flush implementation, as the handshake evaluation will return the first time immediately which would lead to lost messages.

will try this.

@bazsi bazsi mentioned this pull request Dec 3, 2024
@MrAnno
Copy link
Member

MrAnno commented Dec 10, 2024

Fixed by #407

@MrAnno MrAnno closed this Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants