Re: Problem with a Query

From: Ron Johnson <ronljohnsonjr(at)gmail(dot)com>
To: "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Problem with a Query
Date: 2024-08-26 13:22:11
Message-ID: CANzqJaAuqkYKt64k9Qw6mzUoCLZCRx63AFsmLw6uGxukkxE1-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Aggressive autoanalyze and autovacuum settings solve most query problems.
These are my settings:
default_statistics_target = 5000
autovacuum_vacuum_scale_factor = 0.015
autovacuum_vacuum_threshold = 250
autovacuum_analyze_scale_factor = 0.015
autovacuum_analyze_threshold = 250

Such a high default_statistics_target value is controversial, but works for
our databases, and resetting it to 100 doesn't noticably speed up slow
parse/optimize on queries that take a long time to parse/optimize any more
than the 5000 value.

On Mon, Aug 26, 2024 at 6:30 AM Siraj G <tosiraj(dot)g(at)gmail(dot)com> wrote:

> Thanks Tom. Collecting full stats on the tables involved corrected the
> execution.
>
> On Tue, Aug 13, 2024 at 9:57 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Siraj G <tosiraj(dot)g(at)gmail(dot)com> writes:
>> > We migrated a PgSQL database from Cloud SQL to compute engine and since
>> > then there is a SQL we observed taking a long time. After some study, I
>> > found that the SQL is using NESTED LOOP where the cost is too high.
>>
>> The core of your problem seems to be here:
>>
>> > -> Index Scan using
>> marketing_a_cancel__55ffff_idx on
>> > marketing_app_leadhistory w0 (cost=0.57..4274.30 rows=1 width=8)
>> (actual
>> > time=46.678..51.232 rows=44 loops=1)
>> > Index Cond: ((cancel_event_id IS NOT NULL)
>> AND
>> > (cancel_event_type = 1))
>> > Filter: ((status_id = 93) AND
>> > ((followup_date)::date >= '2024-08-01'::date) AND
>> ((followup_date)::date <=
>> > '2024-08-07'::date))
>> > Rows Removed by Filter: 22268
>> > Buffers: shared hit=9170 read=19
>>
>> If the planner had estimated 40-some rows out of this step, rather
>> than one, it would certainly not have chosen to use nestloop joins
>> atop this. So the big problem to focus on is making that estimate
>> better.
>>
>> A secondary problem is that the choice of index seems poor: the
>> index itself is selecting 44+22268 = 22312 rows and then the filter
>> condition is throwing away 99.8% of those rows. Probably, using
>> an index on (status_id, followup_date) would have worked better.
>>
>> I suspect that both of these things are tied to the non-normalization
>> of your "cancel" condition. The planner probably believes that
>> "cancel_event_id IS NOT NULL" is statistically independent of
>> "cancel_event_type = 1"; but I'll bet it isn't, and thus the index
>> condition selects many more rows than the planner guessed. You might
>> be able to improve that estimate by creating extended stats on both of
>> those columns, but really a better idea would be to take a step back
>> and figure out if those two columns can't be merged into one.
>>
>> regards, tom lane
>>
>

--
Death to America, and butter sauce.
Iraq lobster!

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2024-08-26 14:39:09 Re: Issue Installing PostgreSQL 17 on Ubuntu 22.04 (Jammy)
Previous Message Ray O'Donnell 2024-08-26 11:48:18 Re: Issue Installing PostgreSQL 17 on Ubuntu 22.04 (Jammy)