query timing out: daily active users #862

csillag · 2024-12-22T02:17:55Z

This query was fixed (improved in efficiency) recently in #607 (cc [@Andrew7234), but now it's consistently timing out again.

From mainnet staging logs:

WITH relevant_rounds AS (
			SELECT min(round) as min_round, max(round) as max_round
			FROM chain.runtime_blocks AS b
			WHERE (b.runtime = $1 AND b.timestamp >= $2::timestamptz AND b.timestamp < $3::timestamptz)
		)
		SELECT COUNT(DISTINCT account_address)
		FROM chain.runtime_related_transactions AS rt, relevant_rounds
		WHERE (rt.runtime = $1 AND tx_round >= relevant_rounds.min_round AND tx_round <= relevant_rounds.max_round)

args: [
0: "emerald"
1: "2022-03-15T02:20:00Z"
2: "2022-03-16T02:20:00Z"

Discussion:

@Andrew7234

oasisindexer=> explain analyze WITH relevant_rounds AS (
        SELECT min(round) as min_round, max(round) as max_round
        FROM chain.runtime_blocks AS b
        WHERE (b.runtime = 'emerald' AND b.timestamp >= '2022-03-15T02:20:00Z' AND b.timestamp < '2022-03-16T02:20:00Z')
)
SELECT COUNT(DISTINCT account_address)
FROM chain.runtime_related_transactions AS rt, relevant_rounds
                WHERE (rt.runtime = 'emerald' AND tx_round >= relevant_rounds.min_round AND tx_round <= relevant_rounds.max_round);
                                                                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=6654916.70..6654916.71 rows=1 width=8) (actual time=18810.986..18810.994 rows=1 loops=1)
   ->  Nested Loop  (cost=705652.14..6610085.65 rows=17932419 width=47) (actual time=4268.994..12804.725 rows=1452131 loops=1)
         ->  Aggregate  (cost=22921.23..22921.24 rows=1 width=16) (actual time=369.592..369.597 rows=1 loops=1)
               ->  Index Scan using ix_runtime_blocks_timestamp on runtime_blocks b  (cost=0.56..22890.94 rows=6057 width=8) (actual time=2.250..367.111 rows=9937 loops=1)
                     Index Cond: ((runtime = 'emerald'::runtime) AND ("timestamp" >= '2022-03-15 02:20:00+00'::timestamp with time zone) AND ("timestamp" < '2022-03-16 02:20:00+00'::timestamp with time zone))
         ->  Bitmap Heap Scan on runtime_related_transactions rt  (cost=682730.91..6407840.22 rows=17932419 width=55) (actual time=3899.384..12176.384 rows=1452131 loops=1)
               Recheck Cond: ((runtime = 'emerald'::runtime) AND ((tx_round)::bigint >= (min((b.round)::bigint))) AND ((tx_round)::bigint <= (max((b.round)::bigint))))
               Heap Blocks: exact=20638
               ->  Bitmap Index Scan on ix_runtime_related_transactions_round_index  (cost=0.00..678247.81 rows=17932419 width=0) (actual time=3893.056..3893.057 rows=1452131 loops=1)
                     Index Cond: ((runtime = 'emerald'::runtime) AND ((tx_round)::bigint >= (min((b.round)::bigint))) AND ((tx_round)::bigint <= (max((b.round)::bigint))))
 Planning Time: 0.454 ms
 Execution Time: 18811.122 ms
(12 rows)

@Andrew7234:

he query itself is using the indexes as expected(?), but it's just slow because emerald in 2022 was quite active. Each query returns ~4600 accounts.

The stats are making progress, but doing so slowly since we recalculate daily_active_accounts for 5 min window increments. Currently the batch timeout is 3min, which allows for ~5-6 queries to finish before hitting the timeout.

@Andrew7234

Options:

Parallelize the queries within the daily_active_accounts stat, eg using the item-based analyzer. Need to check how the db would handle the load.

Parallelize the different stats computations within the aggregate_stats analyzer. Currently, each query of each stat is computed sequentially.

Increase window_step for daily active accounts

@mitjat:

Thank you for digging and explaining! That looks pretty good to me actually. And if 5-6 queries succeed, it means we're throwing out about 10% of compute on average (0.5 out of 5 queries).

Speed projection: We process ~30minutes in every 3min batch, so to process the spammy ~3 months (IIRC) of emerald, we'll need 0.3 months, or 10 days. That's acceptable IMO.

Of your options above, I like option 2 best, seems simple enough. But for now, IDK that we should invest time even into that. Your call on whether it's a really quick fix. I'm fine to just close the ticket after this exploration of yours.

@Andrew7234:

Turns out the query was slow not just for the busy period of emerald, but just in general. On sapphire, it was even slower, to the point where even option 2 would not have helped enough to allow the analyzer to catch up within a reasonable time frame. Discussed on slack with Warren; looks like postgres doesn't recognize that the timestamp column is monotonic and therefore creates strange inefficient query plans.

Warren crafted a much more efficient query (500ms), which has been added in #622

The text was updated successfully, but these errors were encountered:

csillag · 2024-12-22T02:18:13Z

Resolved by #622

csillag closed this as completed Dec 22, 2024

csillag mentioned this issue Dec 22, 2024

optimize daily active accounts query #622

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

query timing out: daily active users #862

query timing out: daily active users #862

csillag commented Dec 22, 2024 •

edited

Loading

csillag commented Dec 22, 2024

query timing out: daily active users #862

query timing out: daily active users #862

Comments

csillag commented Dec 22, 2024 • edited Loading

Discussion:

csillag commented Dec 22, 2024

csillag commented Dec 22, 2024 •

edited

Loading