-
-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow loki label drop down loading on Grafana #554
Comments
Hello @qqu0127 and thanks for the report. We need further details to assist.
|
Thanks @lmangani for the reply! |
And to add more context, we compared it with Grafana loki side by side with the same ingestion load and similar storage. Grafana loki works well with the label filtering, in resonable amount of time. It could be the metadata indexing it has. |
@qqu0127 thanks for the detail, but unless you have an incredible amount of data the expected performance should be the same or better than loki in most setups. Perhaps the issue is with ClickHouse. What's the load/performance and how many resources have you allocated? If you're using a Cluster have you configured the cluster settings in qryn? |
Hello @qqu0127
Tnaks in advance. |
Thanks. I think I might know the cause. Some of my labels have a large value cardinatliy, like "session_id" and "trace_id", and like unbounded. Unfortunately, I can't run the number now because the old data was purged before I see the message. I cleaned up some labels and testing with it. |
@qqu0127 please share the results of the requests so we can help to fix your case. The aspect you have described is one of the design assumptions. The minimal time range Qryn respects in terms of labels, series and values requests is 1 day. If you request series of different time ranges within 1 day, the result will be the same. |
@qqu0127 after a set of improvements and benchmarks I can draw some conclusions. GZIP compressing of the output:The main bottleneck is gzip compressor. The request of 10M series (1.2G of response body):
with GZIP
Grafana restrictionsAfter a set of updates qryn doesn't fail by OOM after a request of 1.2G of series So I believe it's not worth to have more than 10-20K of series as they will not be reviewable anyway. Qryn series hard limitSince v3.2.30 Qryn has an env var configuration Again, I don't see any reason to set it more than 20 000 of series as Grafana UI doesn't show them anyway. Feel free to download 3.2.30 and try the upgrades and the new functionality |
HI @akvlad , sorry it takes me some time to get back. As I said, the previous data that caused the issue was gone. I ran the query on my current cluster anyway, the results are The latency and OOM issue is resulved by removing some high cardinality labels. |
Thanks for the update @qqu0127
We definitely want to make this work with or without high cardinality, so if you get a chance again to try let us know |
HI,
Background:
I deployed qryn with a clickhouse proxy. The ingestion load is about 2MB/s. In my setup, qryn is added as loki source on Grafana and that's where we mostly use for log query.
Problem:
Using the Grafana UI, I notice it takes very long time for the label dropdown options to load. Specifically, it works fine for the first label search, it seems both the label name and values are pre-fetched or indexed. However, when it comes to the second and more label filtering it takes long time just to load the label options. It timeout and cause OOM in the qryn app.
The second or more label filter is basically not working for me. I know I can use loki native query directly but that's simply not a good choice.
My findings so far:
I found the qryn log:
Not a SQL expert, but I feel this one is inefficient. Anyone has a clue?
The text was updated successfully, but these errors were encountered: