Quick Links

Range partitioning query performance with date_trunc (vs timescaledb)

From:	Philippe Pepiot <phil(at)philpep(dot)org>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Range partitioning query performance with date_trunc (vs timescaledb)
Date:	2023-08-29 07:40:06
Message-ID:	ZO2g1l1ho60LSEBt@bezout.in.philpep.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi,

I'm trying to implement some range partitioning on timeseries data. But it
looks some queries involving date_trunc() doesn't make use of partitioning.

BEGIN;
CREATE TABLE test (
time TIMESTAMP WITHOUT TIME ZONE NOT NULL,
value FLOAT NOT NULL
) PARTITION BY RANGE (time);
CREATE INDEX test_time_idx ON test(time DESC);
CREATE TABLE test_y2010 PARTITION OF test FOR VALUES FROM ('2020-01-01') TO ('2021-01-01');
CREATE TABLE test_y2011 PARTITION OF test FOR VALUES FROM ('2021-01-01') TO ('2022-01-01');
CREATE VIEW vtest AS SELECT DATE_TRUNC('year', time) AS time, SUM(value) AS value FROM test GROUP BY 1;
EXPLAIN (COSTS OFF) SELECT * FROM vtest WHERE time >= TIMESTAMP '2021-01-01';
ROLLBACK;

The plan query all partitions:

HashAggregate
Group Key: (date_trunc('year'::text, test."time"))
-> Append
-> Seq Scan on test_y2010 test_1
Filter: (date_trunc('year'::text, "time") >= '2021-01-01 00:00:00'::timestamp without time zone)
-> Seq Scan on test_y2011 test_2
Filter: (date_trunc('year'::text, "time") >= '2021-01-01 00:00:00'::timestamp without time zone)

The view is there so show the use case, but we get almost similar plan with SELECT * FROM test WHERE DATE_TRUNC('year', time) >= TIMESTAMP '2021-01-01';

I tested a variation with timescaledb which seem using trigger based
partitioning:

BEGIN;
CREATE EXTENSION IF NOT EXISTS timescaledb;
CREATE TABLE test (
time TIMESTAMP WITHOUT TIME ZONE NOT NULL,
value FLOAT NOT NULL
);
SELECT create_hypertable('test', 'time', chunk_time_interval => INTERVAL '1 year');
CREATE VIEW vtest AS SELECT time_bucket('1 year', time) AS time, SUM(value) AS value FROM test GROUP BY 1;
-- insert some data as partitions are created on the fly
INSERT INTO test VALUES (TIMESTAMP '2020-01-15', 1.0), (TIMESTAMP '2021-12-15', 2.0);
\d+ test
EXPLAIN (COSTS OFF) SELECT * FROM vtest WHERE time >= TIMESTAMP '2021-01-01';
ROLLBACK;

The plan query a single partition:

GroupAggregate
Group Key: (time_bucket('1 year'::interval, _hyper_1_2_chunk."time"))
-> Result
-> Index Scan Backward using _hyper_1_2_chunk_test_time_idx on _hyper_1_2_chunk
Index Cond: ("time" >= '2021-01-01 00:00:00'::timestamp without time zone)
Filter: (time_bucket('1 year'::interval, "time") >= '2021-01-01 00:00:00'::timestamp without time zone)

Note single partition query only works with time_bucket(), not with date_trunc(), I guess
there is some magic regarding this in time_bucket() implementation.

I wonder if there is a way with a reasonable amount of SQL code to achieve this
with vanilla postgres ?

Maybe by taking assumption that DATE_TRUNC(..., time) <= time ?

Thanks!

Responses

Re: Range partitioning query performance with date_trunc (vs timescaledb) at 2023-08-29 09:38:05 from David Rowley

Browse pgsql-performance by date

	From	Date	Subject
Next Message	David Rowley	2023-08-29 09:38:05	Re: Range partitioning query performance with date_trunc (vs timescaledb)
Previous Message	Peter Geoghegan	2023-08-29 01:49:13	Re: Index bloat and REINDEX/VACUUM optimization for partial index