Skip to content

output: add metrics for number of writing events in secondary #4971

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 28, 2025

Conversation

Watson1978
Copy link
Contributor

@Watson1978 Watson1978 commented May 20, 2025

Which issue(s) this PR fixes:
Fixes #

What this PR does / why we need it:
Add metrics for number of writing events in secondary to Output plugins
This metrics can be used to determine whether recovery is required from secondary if failure occurs.

Docs Changes:
fluent/fluentd-docs-gitbook#582

Release Note:
Same as the title.

@Watson1978 Watson1978 force-pushed the add-secondary_chunk_count_metrics branch from 3b11f8c to 6679d83 Compare May 21, 2025 02:52
@Watson1978 Watson1978 marked this pull request as ready for review May 21, 2025 07:38
@Watson1978 Watson1978 requested review from kenhys and daipom May 21, 2025 07:38
kenhys
kenhys previously approved these changes May 22, 2025
Copy link
Contributor

@kenhys kenhys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@daipom
Copy link
Contributor

daipom commented May 22, 2025

Hmm, we may need to work out the specifications in more detail.
For example...

  • What this metrics mean exactly:
    • Is it really Number of stored chunks in secondary?
    • Maybe we can express it less misleadingly?
  • The value increases in commit_write, but is it correct?
    • I'm concerned that the backup process calls commit_write as well.
    • What happens if the secondary uses delayed_commit?

For example, if a backup occurs in the secondary and it is also counted up as this metrics, it could be OK as a specification of this metrics.
What we need would be to sort out those specifications.

@Watson1978
Copy link
Contributor Author

I implemented it thinking that the more accurate the value, it is the better.
but if we could realize the number of trying to write secondary, we can realize that restoration is necessary.
It is enough to know that the value is not accurate.

OK, I will improve this PR.

@Watson1978 Watson1978 force-pushed the add-secondary_chunk_count_metrics branch 4 times, most recently from 5ef7bf1 to 83759af Compare May 26, 2025 06:40
@Watson1978 Watson1978 force-pushed the add-secondary_chunk_count_metrics branch 2 times, most recently from 76d87f4 to d066438 Compare May 27, 2025 02:53
@daipom
Copy link
Contributor

daipom commented May 27, 2025

Should we add these getters?
I'm concerned that we don't have some already, though...

def num_errors
@num_errors_metrics.get
end
def emit_count
@emit_count_metrics.get
end
def emit_size
@emit_size_metrics.get
end
def emit_records
@emit_records_metrics.get
end
def write_count
@write_count_metrics.get
end
def rollback_count
@rollback_count_metrics.get
end

@daipom
Copy link
Contributor

daipom commented May 27, 2025

Should we add these getters? I'm concerned that we don't have some already, though...

def num_errors
@num_errors_metrics.get
end
def emit_count
@emit_count_metrics.get
end
def emit_size
@emit_size_metrics.get
end
def emit_records
@emit_records_metrics.get
end
def write_count
@write_count_metrics.get
end
def rollback_count
@rollback_count_metrics.get
end

It seems that we should add the getter because it is unnatural that write_second_count does not have a getter although write_count has a getter.

@Watson1978
Copy link
Contributor Author

Should we add these getters?

I think we can remove these getter methods.
If it defined a getter method, I think we should use appropriately in the following section as well.

stats = {
'emit_records' => @emit_records_metrics.get,
'emit_size' => @emit_size_metrics.get,
# Respect original name
# https://github.com/fluent/fluentd/blob/45c7b75ba77763eaf87136864d4942c4e0c5bfcd/lib/fluent/plugin/in_monitor_agent.rb#L284
'retry_count' => @num_errors_metrics.get,
'emit_count' => @emit_count_metrics.get,
'write_count' => @write_count_metrics.get,
'write_secondary_count' => @write_secondary_count_metrics.get,
'rollback_count' => @rollback_count_metrics.get,
'slow_flush_count' => @slow_flush_count_metrics.get,
'flush_time_count' => @flush_time_count_metrics.get,

Look like we have use these getter methods only used in test codes.
Is it really necessary to define it?

@Watson1978 Watson1978 force-pushed the add-secondary_chunk_count_metrics branch from 499bb84 to d066438 Compare May 27, 2025 05:50
@Watson1978
Copy link
Contributor Author

Watson1978 commented May 27, 2025

Since it is easy to forget to add this manually, it would be better if the getter were generated automatically.

I will try that with another PR.
#4978

@daipom daipom added this to the v1.19.0 milestone May 27, 2025
@daipom daipom changed the title output: add metrics for number of stored chunks in secondary output: add metrics for number of writing events in secondary May 27, 2025
Watson1978 and others added 5 commits May 28, 2025 14:16
@Watson1978 Watson1978 force-pushed the add-secondary_chunk_count_metrics branch from d066438 to 499fb55 Compare May 28, 2025 05:47
@Watson1978 Watson1978 force-pushed the add-secondary_chunk_count_metrics branch from 499fb55 to 5ded991 Compare May 28, 2025 07:48
Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@daipom
Copy link
Contributor

daipom commented May 28, 2025

@Watson1978 Could you update https://docs.fluentd.org/input/monitor_agent#output-example?
It looks very old 😢

@Watson1978
Copy link
Contributor Author

@Watson1978 Could you update https://docs.fluentd.org/input/monitor_agent#output-example? It looks very old 😢

OK, I see :)

@Watson1978
Copy link
Contributor Author

I will update the doc after #4981 .
Because its PR add the another metrics too into output.

@daipom
Copy link
Contributor

daipom commented May 28, 2025

I see!

@daipom daipom merged commit cef2f65 into fluent:master May 28, 2025
13 checks passed
@Watson1978
Copy link
Contributor Author

I created PR to update the doc
fluent/fluentd-docs-gitbook#582

@Watson1978 Watson1978 deleted the add-secondary_chunk_count_metrics branch May 29, 2025 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants