Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Significant difference in buffers hit with OR clause affecting planning time #7646

Open
mkindahl opened this issue Feb 3, 2025 · 4 comments
Assignees
Labels

Comments

@mkindahl
Copy link
Contributor

mkindahl commented Feb 3, 2025

What type of bug is this?

Performance issue

What subsystems and features are affected?

Query planner

What happened?

If a clause is using an OR expression, or anything that translates to an OR expression (such as x IN (...)) the planning time goes up significantly.

TimescaleDB version affected

2.18.0

PostgreSQL version used

17.2

What operating system did you use?

Ubuntu 24.04 x64

What installation method did you use?

Source

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

This is the query plan without the BETWEEN qualifier:

QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using _hyper_1_71_chunk_readings_recorded_at_idx on _hyper_1_71_chunk  (cost=0.28..12.59 rows=2 width=24) (actual time=0.014..0.019 rows=2 loops=1)
   Index Cond: (recorded_at = ANY ('{"2025-03-11 08:53:00+01","2025-03-11 12:43:00+01"}'::timestamp with time zone[]))
   Buffers: shared hit=9
 Planning:
   Buffers: shared hit=10187
 Planning Time: 15.861 ms
 Execution Time: 0.126 ms
(7 rows)

This is the query plan with the BETWEEN qualifier:

                                                                                                                               QUERY PLAN                                                                                                                                
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using _hyper_1_71_chunk_readings_recorded_at_idx on _hyper_1_71_chunk  (cost=0.28..12.60 rows=2 width=24) (actual time=0.014..0.019 rows=2 loops=1)
   Index Cond: ((recorded_at = ANY ('{"2025-03-11 08:53:00+01","2025-03-11 12:43:00+01"}'::timestamp with time zone[])) AND (recorded_at >= '2025-03-11 00:00:00+01'::timestamp with time zone) AND (recorded_at <= '2025-03-11 23:59:59+01'::timestamp with time zone))
   Buffers: shared hit=9
 Planning:
   Buffers: shared hit=771
 Planning Time: 1.038 ms
 Execution Time: 0.065 ms
(7 rows)

How can we reproduce the bug?

-- Create a hypertable with a lot of chunks
create table readings(
    record_id serial,
    recorded_at timestamptz not null,
    device_id integer,
    temperature float
);

select * from create_hypertable('readings', 'recorded_at',
       chunk_time_interval => interval '1 day');

insert into readings(recorded_at, device_id, temperature)
select recorded_at, (random()*30)::int, random()*80 - 40
from generate_series(timestamptz '2025-01-01 00:00:00',
     		     timestamptz '2025-06-01 00:00:00',
		     interval '1 min') as recorded_at;

-- This IN statement will be transformed into an OR statement. Note
-- the number of buffers hit.
explain (analyze, buffers)
select * from readings
 where recorded_at in ('2025-03-11 08:53:00+01', '2025-03-11 12:43:00+01');

-- Here we add an additional constraint to limit the number of chunks
-- that need to be scanned.
explain (analyze, buffers)
select * from readings
 where recorded_at in ('2025-03-11 08:53:00+01', '2025-03-11 12:43:00+01')
   and recorded_at between '2025-03-11 00:00:00+01' and '2025-03-11 23:59:59+01';
@mkindahl mkindahl added the bug label Feb 3, 2025
@mkindahl
Copy link
Contributor Author

mkindahl commented Feb 3, 2025

We have this comment in dimension_restrict_info_open_add, which does not add the constraint if it is a list with more than one value used as constraint.

	/* can't handle IN/ANY with multiple values */
	if (dimvalues->use_or && list_length(dimvalues->values) > 1)
		return false;

@erimatnor
Copy link
Contributor

Looks like IN (x,y,z) expressions can't be used for chunk pruning currently. One idea to fix this is to transform the expression into a SK_SEARCHARRAY scankey and use that when scanning for matching dimension slices (requires and index scan).

@erimatnor erimatnor self-assigned this Feb 6, 2025
@RobAtticus
Copy link
Member

Does this only apply when the clause is related to a dimension column (here, the timestamp column)? I feel like I've used a fair bit of IN but for non-dimension columns and never really noticed the planning time.

@mkindahl
Copy link
Contributor Author

mkindahl commented Feb 7, 2025

Does this only apply when the clause is related to a dimension column (here, the timestamp column)? I feel like I've used a fair bit of IN but for non-dimension columns and never really noticed the planning time.

This is related to using the partitioning column, yes, so it has nothing to do with non-partitioning columns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants