Re: Fwd: different execution time for the same query (and same DB status)

From: Francesco De Angelis <franc(dot)tozzus(at)gmail(dot)com>
To: Michael Lewis <mlewis(at)entrata(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Pgsql Performance <pgsql-performance(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michel SALAIS <msalais(at)msym(dot)fr>
Subject: Re: Fwd: different execution time for the same query (and same DB status)
Date: 2021-03-12 13:39:13
Message-ID: CAHWr2CeYHuorg+tntkT1ije=Hyi+9pFRSH4W0_JnemZ6R=M7rw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have re-tested the execution times with several different values of
shared_buffers in the range 256 MB - 4 GB.
It didn't solve the problem and I noticed that for values greater than 3GB
the executions halt very frequently.
I also tried to disable JIT and this further slowed it down.
But there is an interesting news. I managed to exploit some properties of
the data I am modelling and I have changed the types of the tables and the
query as follows:

CREATE TABLE A (
a1 int2,
a2 int2,
v int4 primary key
);

CREATE TABLE B (
a1 int2,
a2 int2,
v int4 primary key
);

create index hash_pkA on A using hash(v);
create index hash_pkB on B using hash(v);

CREATE TABLE C (
la1 int2,
la2 int2,
va1 int2,
va2 int2,
res text,
c int8
);
create index hash_C on C using hash(c);

select count(*) from (
((select A.v,
coalesce(A.a1,0) as la1,
coalesce(A.a2,0) as la2,
coalesce(B.a1,0) as va1,
coalesce(B.a2,0) as va2
from A
left join B on A.v = B.v)
union all
select B.v,
0 as la1,
0 as la2,
B.a1 as va1,
B.a2 as va2
from B where B.v not in (select A.v from A))as
ta inner join C on
ta.la1 | (ta.la2::int8 << 10) |
(ta.va1::int8 << 20) |
(ta.va2::int8 << 30)
= C.c);

With these changes I get stable results around 15 seconds. Here is the
plan: https://explain.depesz.com/s/Y9dT.
I also verified that I can decrease work_mem to 300MB (against 800MB of the
original query) by keeping the same execution time. In the original one,
decreasing such a value worsens the overall performances instead.
In the new query there is only one comparison on a single column (they were
four in the original one) and I started guessing whether, in the previous
case, the DBMS considers the overall memory consumption is too high and
changes the plan. If yes, I would be interesting in understanding how the
optimisation algorithm works and whether there is a way to disable it.
In this way I can try to better figure out what to do, in the future, in
case the data model cannot be re-arranged like in this case.
Thanks again.

Best regards,
Francesco De Angelis

Il giorno mer 10 mar 2021 alle ore 14:29 Michael Lewis <mlewis(at)entrata(dot)com>
ha scritto:

> I would increase shared_buffers to 1GB or more. Also, it would be very
> interesting to see these queries executed with JIT off.
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Michael Lewis 2021-03-12 16:22:31 Re: Is there a known bug with SKIP LOCKED and "tuple to be locked was already moved to another partition due to concurrent update"?
Previous Message Thomas Munro 2021-03-12 09:33:29 Re: FreeBSD UFS & fsync