Quick Links

A way to optimize sql about the last temporary-related row

From:	"agharta82(at)gmail(dot)com" <agharta82(at)gmail(dot)com>
To:	pgsql-general(at)lists(dot)postgresql(dot)org
Subject:	A way to optimize sql about the last temporary-related row
Date:	2024-06-27 15:20:21
Message-ID:	451083be-83e8-413d-bc3a-ed7f3a6d99a9@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hello everyone,
Sorry to bother you but I have a query that is driving me crazy.

I need to have the last valid record at a temporal level according to a
specific parameter.

First some data:
Linux Rocky 8.10 environment, minimal installation (on VM KVM with
Fedora 40).
Postgresql 16.3, installed by official Postgresql guide.
effective_cache_size = '1000 MB';
shared_buffers = '500 MB';
work_mem = '16MB';
The changes are deliberately minimal to be able to all to simulate the
problem.

Table script:
CREATE TABLE test_table
(
pk_id int NOT NULL,
integer_field_1 int ,
integer_field_2 int,
datetime_field_1 timestamp,
primary key (pk_id)
)

-- insert 4M records
insert into test_table(pk_id) select generate_series(1,4000000,1);

-- now set some random data, distribuited between specific ranges (as in
my production table)
update test_table set
datetime_field_1 = timestamp '2000-01-01 00:00:00' + random() *
(timestamp '2024-05-31 23:59:59' - timestamp '2000-01-01 00:00:00'),
integer_field_1 = floor(random() * (6-1+1) + 1)::int,
integer_field_2 = floor(random() * (200000-1+1) + 1)::int;

-- indexes
CREATE INDEX idx_test_table_integer_field_1 ON test_table(integer_field_1);
CREATE INDEX xtest_table_datetime_field_1 ON test_table(datetime_field_1
desc);
CREATE INDEX idx_test_table_integer_field_2 ON test_table(integer_field_2);

--vacuum

vacuum full test_table;

Now the query:
explain (verbose, buffers, analyze)
with last_table_ids as materialized(
select xx from (
select LAST_VALUE(pk_id) over (partition by integer_field_2 order by
datetime_field_1 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED
FOLLOWING) xx
from test_table
where integer_field_1 = 1
and datetime_field_1 <= CURRENT_TIMESTAMP
) ww group by ww.xx

),
last_row_per_ids as (
select tt.* from last_table_ids lt
inner join test_table tt on (tt.pk_id = lt.xx)

)

select * /* or count(*) */ from last_row_per_ids;

This query, on my PC, takes 46 seconds!!!
I was expecting about 2-3 seconds (according with my other queries in
this table) but it seems that the xtest_table_datetime_field_1 index is
not being used.

Do you think there is a way to optimize the query?

Thanks so much for the support,

Agharta

Responses

Re: A way to optimize sql about the last temporary-related row at 2024-06-27 15:27:40 from Ron Johnson
Re: A way to optimize sql about the last temporary-related row at 2024-06-27 15:33:18 from David G. Johnston
Re: A way to optimize sql about the last temporary-related row at 2024-06-27 16:16:24 from David Rowley

Browse pgsql-general by date

	From	Date	Subject
Next Message	Ron Johnson	2024-06-27 15:27:40	Re: A way to optimize sql about the last temporary-related row
Previous Message	Bruno Wolff III	2024-06-27 13:06:29	Re: Can any_value be used like first_value in an aggregate?