Re: Range partitioning query performance with date_trunc (vs timescaledb)

From: Philippe Pepiot <phil(at)philpep(dot)org>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Range partitioning query performance with date_trunc (vs timescaledb)
Date: 2023-09-11 12:21:30
Message-ID: ZP8GSpzTAbXZql4z@bezout.in.philpep.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 29/08/2023, David Rowley wrote:
> On Tue, 29 Aug 2023 at 19:40, Philippe Pepiot <phil(at)philpep(dot)org> wrote:
> > I'm trying to implement some range partitioning on timeseries data. But it
> > looks some queries involving date_trunc() doesn't make use of partitioning.
> >
> > BEGIN;
> > CREATE TABLE test (
> > time TIMESTAMP WITHOUT TIME ZONE NOT NULL,
> > value FLOAT NOT NULL
> > ) PARTITION BY RANGE (time);
> > CREATE INDEX test_time_idx ON test(time DESC);
> > CREATE TABLE test_y2010 PARTITION OF test FOR VALUES FROM ('2020-01-01') TO ('2021-01-01');
> > CREATE TABLE test_y2011 PARTITION OF test FOR VALUES FROM ('2021-01-01') TO ('2022-01-01');
> > CREATE VIEW vtest AS SELECT DATE_TRUNC('year', time) AS time, SUM(value) AS value FROM test GROUP BY 1;
> > EXPLAIN (COSTS OFF) SELECT * FROM vtest WHERE time >= TIMESTAMP '2021-01-01';
> > ROLLBACK;
> >
> > The plan query all partitions:
>
> > I wonder if there is a way with a reasonable amount of SQL code to achieve this
> > with vanilla postgres ?
>
> The only options I see for you are
>
> 1) partition by LIST(date_Trunc('year', time)), or;
> 2) use a set-returning function instead of a view and pass the date
> range you want to select from the underlying table via parameters.
>
> I imagine you won't want to do #1. However, it would at least also
> allow the aggregation to be performed before the Append if you SET
> enable_partitionwise_aggregate=1.
>
> #2 isn't as flexible as a view as you'd have to create another
> function or expand the parameters of the existing one if you want to
> add items to the WHERE clause.
>
> Unfortunately, date_trunc is just a black box to partition pruning, so
> it's not able to determine that DATE_TRUNC('year', time) >=
> '2021-01-01' is the same as time >= '2021-01-01'. It would be
> possible to make PostgreSQL do that, but that's a core code change,
> not something that you can do from SQL.

Ok I think I'll go for Set-returning function since
LIST or RANGE on (date_trunc('year', time)) will break advantage of
partitioning when querying with "time betwen x and y".

Thanks!

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Frits Hoogland 2023-09-11 13:54:59 Re: Planning time is time-consuming
Previous Message David Rowley 2023-09-11 10:17:09 Re: Planning time is time-consuming