[Bug]: Significant difference in buffers hit with OR clause affecting planning time #7646

mkindahl · 2025-02-03T07:24:41Z

What type of bug is this?

Performance issue

What subsystems and features are affected?

Query planner

What happened?

If a clause is using an OR expression, or anything that translates to an OR expression (such as x IN (...)) the planning time goes up significantly.

TimescaleDB version affected

2.18.0

PostgreSQL version used

17.2

What operating system did you use?

Ubuntu 24.04 x64

What installation method did you use?

Source

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

This is the query plan without the BETWEEN qualifier:

QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using _hyper_1_71_chunk_readings_recorded_at_idx on _hyper_1_71_chunk  (cost=0.28..12.59 rows=2 width=24) (actual time=0.014..0.019 rows=2 loops=1)
   Index Cond: (recorded_at = ANY ('{"2025-03-11 08:53:00+01","2025-03-11 12:43:00+01"}'::timestamp with time zone[]))
   Buffers: shared hit=9
 Planning:
   Buffers: shared hit=10187
 Planning Time: 15.861 ms
 Execution Time: 0.126 ms
(7 rows)

This is the query plan with the BETWEEN qualifier:

                                                                                                                               QUERY PLAN                                                                                                                                
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using _hyper_1_71_chunk_readings_recorded_at_idx on _hyper_1_71_chunk  (cost=0.28..12.60 rows=2 width=24) (actual time=0.014..0.019 rows=2 loops=1)
   Index Cond: ((recorded_at = ANY ('{"2025-03-11 08:53:00+01","2025-03-11 12:43:00+01"}'::timestamp with time zone[])) AND (recorded_at >= '2025-03-11 00:00:00+01'::timestamp with time zone) AND (recorded_at <= '2025-03-11 23:59:59+01'::timestamp with time zone))
   Buffers: shared hit=9
 Planning:
   Buffers: shared hit=771
 Planning Time: 1.038 ms
 Execution Time: 0.065 ms
(7 rows)

How can we reproduce the bug?

-- Create a hypertable with a lot of chunks
create table readings(
    record_id serial,
    recorded_at timestamptz not null,
    device_id integer,
    temperature float
);

select * from create_hypertable('readings', 'recorded_at',
       chunk_time_interval => interval '1 day');

insert into readings(recorded_at, device_id, temperature)
select recorded_at, (random()*30)::int, random()*80 - 40
from generate_series(timestamptz '2025-01-01 00:00:00',
     		     timestamptz '2025-06-01 00:00:00',
		     interval '1 min') as recorded_at;

-- This IN statement will be transformed into an OR statement. Note
-- the number of buffers hit.
explain (analyze, buffers)
select * from readings
 where recorded_at in ('2025-03-11 08:53:00+01', '2025-03-11 12:43:00+01');

-- Here we add an additional constraint to limit the number of chunks
-- that need to be scanned.
explain (analyze, buffers)
select * from readings
 where recorded_at in ('2025-03-11 08:53:00+01', '2025-03-11 12:43:00+01')
   and recorded_at between '2025-03-11 00:00:00+01' and '2025-03-11 23:59:59+01';

The text was updated successfully, but these errors were encountered:

mkindahl · 2025-02-03T07:37:45Z

We have this comment in dimension_restrict_info_open_add, which does not add the constraint if it is a list with more than one value used as constraint.

	/* can't handle IN/ANY with multiple values */
	if (dimvalues->use_or && list_length(dimvalues->values) > 1)
		return false;

erimatnor · 2025-02-05T13:57:39Z

Looks like IN (x,y,z) expressions can't be used for chunk pruning currently. One idea to fix this is to transform the expression into a SK_SEARCHARRAY scankey and use that when scanning for matching dimension slices (requires and index scan).

RobAtticus · 2025-02-06T20:47:23Z

Does this only apply when the clause is related to a dimension column (here, the timestamp column)? I feel like I've used a fair bit of IN but for non-dimension columns and never really noticed the planning time.

mkindahl · 2025-02-07T10:00:20Z

Does this only apply when the clause is related to a dimension column (here, the timestamp column)? I feel like I've used a fair bit of IN but for non-dimension columns and never really noticed the planning time.

This is related to using the partitioning column, yes, so it has nothing to do with non-partitioning columns.

mkindahl added the bug label Feb 3, 2025

erimatnor self-assigned this Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Significant difference in buffers hit with OR clause affecting planning time #7646

[Bug]: Significant difference in buffers hit with OR clause affecting planning time #7646

mkindahl commented Feb 3, 2025 •

edited

Loading

mkindahl commented Feb 3, 2025

erimatnor commented Feb 5, 2025

RobAtticus commented Feb 6, 2025

mkindahl commented Feb 7, 2025

[Bug]: Significant difference in buffers hit with OR clause affecting planning time #7646

[Bug]: Significant difference in buffers hit with OR clause affecting planning time #7646

Comments

mkindahl commented Feb 3, 2025 • edited Loading

What type of bug is this?

What subsystems and features are affected?

What happened?

TimescaleDB version affected

PostgreSQL version used

What operating system did you use?

What installation method did you use?

What platform did you run on?

Relevant log output and stack trace

How can we reproduce the bug?

mkindahl commented Feb 3, 2025

erimatnor commented Feb 5, 2025

RobAtticus commented Feb 6, 2025

mkindahl commented Feb 7, 2025

mkindahl commented Feb 3, 2025 •

edited

Loading