Re: Accounting for between table correlation

From: Alexander Stoddard <alexander(dot)stoddard(at)gmail(dot)com>
To: Michael Lewis <mlewis(at)entrata(dot)com>, "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Thomas Kellerer <shammat(at)gmx(dot)net>
Subject: Re: Accounting for between table correlation
Date: 2021-01-15 22:27:14
Message-ID: CADDNc-C330AHOm4JudxxveA=DGW92KpFXygkjHsbPZyrF4N_MA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Jan 15, 2021 at 12:27 PM Michael Lewis <mlewis(at)entrata(dot)com> wrote:

> On Fri, Jan 15, 2021 at 10:22 AM Alexander Stoddard <
> alexander(dot)stoddard(at)gmail(dot)com> wrote:
>
>> The 'fast plans' use parallel seq scans. The 'slow plans' is using index
>> scans. It appears a good query plan correctly predicts it should be bulk
>> processing the tables but bad ones get fooled into trashing (hard disk, not
>> SSD) by mispredicting too few rows to join between the tables.
>>
>
> How many tables are involved?
>

The queries are complex, multiple joins to 10 plus tables, although most
are to tiny enum type lookup tables. I believe it is the join between the
two large tables that I have described that causes the issue, and that
seems to be reflected in the different strategies in the plans. For my own
learning and to clarify the problem I probably will have to try and
reproduce the behavior in a test case.

> Are you sure it is stats getting updated causing the change in behavior?
>
No I'm not sure, could something else flip a plan after an ANALYZE?
Differing performance of multiple runs of the same query could be due
caching etc. but that would be a timing difference without a change in
query plan. The output plans I see are radically different and correlate
with large magnitude performance changes.

> Are you hitting the genetic optimizer?
>

I am doing nothing to specify the optimizer. Do I have configurable options
in that regard? I was unaware of them.

Thank you,
Alex

On Fri, Jan 15, 2021 at 12:27 PM Michael Lewis <mlewis(at)entrata(dot)com> wrote:

> On Fri, Jan 15, 2021 at 10:22 AM Alexander Stoddard <
> alexander(dot)stoddard(at)gmail(dot)com> wrote:
>
>> The 'fast plans' use parallel seq scans. The 'slow plans' is using index
>> scans. It appears a good query plan correctly predicts it should be bulk
>> processing the tables but bad ones get fooled into trashing (hard disk, not
>> SSD) by mispredicting too few rows to join between the tables.
>>
>
> How many tables are involved? Are you sure it is stats getting updated
> causing the change in behavior? Are you hitting the genetic optimizer?
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2021-01-16 00:03:17 Re: PostgreSQL License Question
Previous Message Adrian Klaver 2021-01-15 21:57:39 Re: Best tools to monitor and fine tune postgres