Re: Problem with a Query

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Siraj G <tosiraj(dot)g(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Problem with a Query
Date: 2024-08-13 04:27:11
Message-ID: 1155364.1723523231@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Siraj G <tosiraj(dot)g(at)gmail(dot)com> writes:
> We migrated a PgSQL database from Cloud SQL to compute engine and since
> then there is a SQL we observed taking a long time. After some study, I
> found that the SQL is using NESTED LOOP where the cost is too high.

The core of your problem seems to be here:

> -> Index Scan using marketing_a_cancel__55ffff_idx on
> marketing_app_leadhistory w0 (cost=0.57..4274.30 rows=1 width=8) (actual
> time=46.678..51.232 rows=44 loops=1)
> Index Cond: ((cancel_event_id IS NOT NULL) AND
> (cancel_event_type = 1))
> Filter: ((status_id = 93) AND
> ((followup_date)::date >= '2024-08-01'::date) AND ((followup_date)::date <=
> '2024-08-07'::date))
> Rows Removed by Filter: 22268
> Buffers: shared hit=9170 read=19

If the planner had estimated 40-some rows out of this step, rather
than one, it would certainly not have chosen to use nestloop joins
atop this. So the big problem to focus on is making that estimate
better.

A secondary problem is that the choice of index seems poor: the
index itself is selecting 44+22268 = 22312 rows and then the filter
condition is throwing away 99.8% of those rows. Probably, using
an index on (status_id, followup_date) would have worked better.

I suspect that both of these things are tied to the non-normalization
of your "cancel" condition. The planner probably believes that
"cancel_event_id IS NOT NULL" is statistically independent of
"cancel_event_type = 1"; but I'll bet it isn't, and thus the index
condition selects many more rows than the planner guessed. You might
be able to improve that estimate by creating extended stats on both of
those columns, but really a better idea would be to take a step back
and figure out if those two columns can't be merged into one.

regards, tom lane

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Durumdara 2024-08-13 09:47:10 PG Dump on 11 - Restore on 16 - is possible?
Previous Message Siraj G 2024-08-13 04:03:50 Problem with a Query