Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query timing out: daily active users #862

Closed
csillag opened this issue Dec 22, 2024 · 1 comment · Fixed by #622
Closed

query timing out: daily active users #862

csillag opened this issue Dec 22, 2024 · 1 comment · Fixed by #622

Comments

@csillag
Copy link
Contributor

csillag commented Dec 22, 2024

This query was fixed (improved in efficiency) recently in #607 (cc [@Andrew7234), but now it's consistently timing out again.

From mainnet staging logs:

WITH relevant_rounds AS (
			SELECT min(round) as min_round, max(round) as max_round
			FROM chain.runtime_blocks AS b
			WHERE (b.runtime = $1 AND b.timestamp >= $2::timestamptz AND b.timestamp < $3::timestamptz)
		)
		SELECT COUNT(DISTINCT account_address)
		FROM chain.runtime_related_transactions AS rt, relevant_rounds
		WHERE (rt.runtime = $1 AND tx_round >= relevant_rounds.min_round AND tx_round <= relevant_rounds.max_round)

args: [
0: "emerald"
1: "2022-03-15T02:20:00Z"
2: "2022-03-16T02:20:00Z"

Discussion:

@Andrew7234

oasisindexer=> explain analyze WITH relevant_rounds AS (
        SELECT min(round) as min_round, max(round) as max_round
        FROM chain.runtime_blocks AS b
        WHERE (b.runtime = 'emerald' AND b.timestamp >= '2022-03-15T02:20:00Z' AND b.timestamp < '2022-03-16T02:20:00Z')
)
SELECT COUNT(DISTINCT account_address)
FROM chain.runtime_related_transactions AS rt, relevant_rounds
                WHERE (rt.runtime = 'emerald' AND tx_round >= relevant_rounds.min_round AND tx_round <= relevant_rounds.max_round);
                                                                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=6654916.70..6654916.71 rows=1 width=8) (actual time=18810.986..18810.994 rows=1 loops=1)
   ->  Nested Loop  (cost=705652.14..6610085.65 rows=17932419 width=47) (actual time=4268.994..12804.725 rows=1452131 loops=1)
         ->  Aggregate  (cost=22921.23..22921.24 rows=1 width=16) (actual time=369.592..369.597 rows=1 loops=1)
               ->  Index Scan using ix_runtime_blocks_timestamp on runtime_blocks b  (cost=0.56..22890.94 rows=6057 width=8) (actual time=2.250..367.111 rows=9937 loops=1)
                     Index Cond: ((runtime = 'emerald'::runtime) AND ("timestamp" >= '2022-03-15 02:20:00+00'::timestamp with time zone) AND ("timestamp" < '2022-03-16 02:20:00+00'::timestamp with time zone))
         ->  Bitmap Heap Scan on runtime_related_transactions rt  (cost=682730.91..6407840.22 rows=17932419 width=55) (actual time=3899.384..12176.384 rows=1452131 loops=1)
               Recheck Cond: ((runtime = 'emerald'::runtime) AND ((tx_round)::bigint >= (min((b.round)::bigint))) AND ((tx_round)::bigint <= (max((b.round)::bigint))))
               Heap Blocks: exact=20638
               ->  Bitmap Index Scan on ix_runtime_related_transactions_round_index  (cost=0.00..678247.81 rows=17932419 width=0) (actual time=3893.056..3893.057 rows=1452131 loops=1)
                     Index Cond: ((runtime = 'emerald'::runtime) AND ((tx_round)::bigint >= (min((b.round)::bigint))) AND ((tx_round)::bigint <= (max((b.round)::bigint))))
 Planning Time: 0.454 ms
 Execution Time: 18811.122 ms
(12 rows)

@Andrew7234:

he query itself is using the indexes as expected(?), but it's just slow because emerald in 2022 was quite active. Each query returns ~4600 accounts.

The stats are making progress, but doing so slowly since we recalculate daily_active_accounts for 5 min window increments. Currently the batch timeout is 3min, which allows for ~5-6 queries to finish before hitting the timeout.

image

@Andrew7234

Options:

  • Parallelize the queries within the daily_active_accounts stat, eg using the item-based analyzer. Need to check how the db would handle the load.
  • Parallelize the different stats computations within the aggregate_stats analyzer. Currently, each query of each stat is computed sequentially.
  • Increase window_step for daily active accounts

@mitjat:

Thank you for digging and explaining! That looks pretty good to me actually. And if 5-6 queries succeed, it means we're throwing out about 10% of compute on average (0.5 out of 5 queries).

Speed projection: We process ~30minutes in every 3min batch, so to process the spammy ~3 months (IIRC) of emerald, we'll need 0.3 months, or 10 days. That's acceptable IMO.

Of your options above, I like option 2 best, seems simple enough. But for now, IDK that we should invest time even into that. Your call on whether it's a really quick fix. I'm fine to just close the ticket after this exploration of yours.

@Andrew7234:

Turns out the query was slow not just for the busy period of emerald, but just in general. On sapphire, it was even slower, to the point where even option 2 would not have helped enough to allow the analyzer to catch up within a reasonable time frame. Discussed on slack with Warren; looks like postgres doesn't recognize that the timestamp column is monotonic and therefore creates strange inefficient query plans.

Warren crafted a much more efficient query (500ms), which has been added in #622

@csillag
Copy link
Contributor Author

csillag commented Dec 22, 2024

Resolved by #622

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant