Skip to content

Make Bootstore connection logs less noisy #3891

Open
@andrewjstone

Description

@andrewjstone

When the bootstore sees a possible peer with a lower IP address via DDM, it attempts to connect to it. If the connection fails, it an attempt will be retried one second later. In the case of dogfood we are actually connecting to a peer on a sled which is not in the rack cluster but is running an old version of software which is broken and causes spam. While we should update the sled sw, we should also reduce the spamminess of the log messages by throttling them. I don't think we want to add a backoff here, as this is a very cheap operation and one that we need to complete quickly in the normal case or the rack will boot slowly.

Here's an example of the messages we are seeing on sled 14 on dogfood which is talking to sled 27 which we verified is not part of the RSS cluster.

{"msg":"Accepted connection from [fdb0:a840:2504:351::1]:39517","v":0,"name":"SledAgent","level":30,"time":"2023-08-17T05:43:23.929173122Z","hostname":"BRM42220051","pid":655,"peer_id":"gimlet-BRM42220051-913-0000019-6","component":"bootstore","file":"bootstore/src/schemes/v0/peer.rs:408"}
{"msg":"Connection error: Close","v":0,"name":"SledAgent","level":40,"time":"2023-08-17T05:43:23.929312476Z","hostname":"BRM42220051","pid":655,"peer_id":"gimlet-BRM42220051-913-0000019-6","component":"bootstore","file":"bootstore/src/schemes/v0/peer_networking.rs:216"}

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions