Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin memory usage constantly growing #1102

Open
wgpdt opened this issue Jan 9, 2025 · 10 comments
Open

Plugin memory usage constantly growing #1102

wgpdt opened this issue Jan 9, 2025 · 10 comments
Assignees
Labels
datasource/ClickHouse type/bug Something isn't working

Comments

@wgpdt
Copy link

wgpdt commented Jan 9, 2025

What happened:
We are using an instance of the clickhouse plugin in grafana to periodically query a database. The memory profile of the grafana instance is strictly increasing over time. This memory growth can be attributed to the clickhouse plugin, which shows strictly increasing memory when making connections and querying data. This results in an eventual OOM of the process over the course of some number of days.

Collecting heap profiles over time indicates memory used for the database connection increases over time.

(pprof) top
Showing nodes accounting for 518.67MB, 97.88% of 529.91MB total
Dropped 109 nodes (cum <= 2.65MB)
Showing top 10 nodes out of 29
      flat  flat%   sum%        cum   cum%
  255.07MB 48.13% 48.13%   255.57MB 48.23%  github.com/ClickHouse/ch-go/compress.NewWriter
  247.47MB 46.70% 94.83%   247.47MB 46.70%  bufio.NewReaderSize (inline)
   16.14MB  3.05% 97.88%    16.14MB  3.05%  github.com/ClickHouse/ch-go/proto.(*Buffer).PutString (inline)
         0     0% 97.88%   230.87MB 43.57%  database/sql.(*DB).PingContext
         0     0% 97.88%   230.87MB 43.57%  database/sql.(*DB).PingContext.func1
         0     0% 97.88%   289.29MB 54.59%  database/sql.(*DB).QueryContext
         0     0% 97.88%   289.29MB 54.59%  database/sql.(*DB).QueryContext.func1
         0     0% 97.88%   502.51MB 94.83%  database/sql.(*DB).conn
         0     0% 97.88%   289.29MB 54.59%  database/sql.(*DB).query
         0     0% 97.88%    17.64MB  3.33%  database/sql.(*DB).queryDC

Taking consecutive goroutine profiles show that the number of connectionOpener processes are also strictly increasing:

(pprof) top
Showing nodes accounting for 941, 99.79% of 943 total
Dropped 104 nodes (cum <= 4)
      flat  flat%   sum%        cum   cum%
       941 99.79% 99.79%        941 99.79%  runtime.gopark
         0     0% 99.79%          4  0.42%  bufio.(*Reader).Read
         0     0% 99.79%        923 97.88%  database/sql.(*DB).connectionOpener
         0     0% 99.79%          5  0.53%  internal/poll.(*FD).Read
         0     0% 99.79%          7  0.74%  internal/poll.(*pollDesc).wait
         0     0% 99.79%          7  0.74%  internal/poll.(*pollDesc).waitRead (inline)
         0     0% 99.79%          7  0.74%  internal/poll.runtime_pollWait
         0     0% 99.79%          7  0.74%  runtime.netpollblock
         0     0% 99.79%        932 98.83%  runtime.selectgo

What you expected to happen:
The memory usage is stable over time.

Anything else we need to know?:

Some code references are given below:
New datasources are created here

Connect opens a sql db connection here
Within this:
The db is opened via clickhouse-go
Which is ultimately being opened by a connection opener
The ping context also shows continously increasing memory.

Additionally, it looks like connections are created when a new datasource is created, and a new datasource is created if the grafana config is updated.

Environment:

  • Grafana version: Grafana v11.3.0
  • Plugin version: 4.0.3
  • OS Grafana is installed on: Kubernetes (Grafana helm chart)

We also noticed this in plugin version 4.3.2.

@wgpdt wgpdt added datasource/ClickHouse type/bug Something isn't working labels Jan 9, 2025
@SpencerTorres
Copy link
Collaborator

Hey thanks for submitting this info, I appreciate the detail. There's a few open issues on the clickhouse-go repository related to memory, let me know if any of those sound similar to what you're observing here: https://github.com/ClickHouse/clickhouse-go/issues

@wgpdt
Copy link
Author

wgpdt commented Jan 9, 2025

Hi, I do see an issue about a goroutine leak that arises from making queries, although in our go routine pprof there are no instances of this, only lots of connectionOpeners.

There are other issues are related to inserts but we are only issuing select queries using this datasource.

@SpencerTorres
Copy link
Collaborator

Thanks for checking those. Could you provide some more details about how you're connecting? Config details such as TLS, HTTP/Native, etc.

@wgpdt
Copy link
Author

wgpdt commented Jan 10, 2025

We have tried connecting with no TLS and Native, and TLS and HTTP. Both have the same memory footprint. We also don't set any custom settings, would be using the default DialTimeout/QueryTimeout, custom settings.

@srclosson
Copy link
Member

@SpencerTorres Just a quick ping here. I don't want to loose momentum. I'll message you.

@SpencerTorres
Copy link
Collaborator

Hey @srclosson! I haven't had time to look into this yet. It seems like this is the only case of memory usage growing in a plugin use case.

Additionally, it looks like connections are created when a new datasource is created, and a new datasource is created if the grafana config is updated.

It's possible there is a memory leak related to connections, but I am also wondering why it's making so many connections in the first place. Perhaps some kind of TCP network configuration causing the connection to drop/fail?

@wgpdt which version of ClickHouse is this? Is it self hosted or ClickHouse Cloud?

@wgpdt
Copy link
Author

wgpdt commented Jan 16, 2025

Hi @SpencerTorres, we're using self hosted clickhouse. The issue is likely on the grafana side, perhaps with how the plug in is used. We are also utilising grafana's alert manager, which runs a large number of queries.

@SpencerTorres SpencerTorres self-assigned this Jan 23, 2025
@srclosson
Copy link
Member

@SpencerTorres Is there something we can do to help move things forward?

@srclosson
Copy link
Member

@SpencerTorres I really need to hear what the plan is? Where are we in the queue?

@SpencerTorres
Copy link
Collaborator

@srclosson apologies for the delay, it looks like this is being investigated/addressed in #1154 by @adamyeats

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasource/ClickHouse type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants