Re: Query performance issue

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Nagaraj Raj <nagaraj(dot)sf(at)yahoo(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Query performance issue
Date: 2021-01-22 02:35:14
Message-ID: 20210122023514.GD27167@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, Jan 22, 2021 at 01:53:26AM +0000, Nagaraj Raj wrote:
> Tables ddl are attached in dbfiddle -- Postgres 11 | db<>fiddle
> Postgres 11 | db<>fiddle
> Server configuration is: Version: 10.11RAM - 320GBvCPU - 32 "maintenance_work_mem" 256MB"work_mem"             1GB"shared_buffers" 64GB

> Aggregate (cost=31.54..31.55 rows=1 width=8) (actual time=0.010..0.012 rows=1 loops=1)
> -> Nested Loop (cost=0.00..31.54 rows=1 width=8) (actual time=0.007..0.008 rows=0 loops=1)
> Join Filter: (a.household_entity_proxy_id = c.household_entity_proxy_id)
> -> Nested Loop (cost=0.00..21.36 rows=1 width=16) (actual time=0.006..0.007 rows=0 loops=1)
> Join Filter: (a.individual_entity_proxy_id = b.individual_entity_proxy_id)
> -> Seq Scan on prospect a (cost=0.00..10.82 rows=1 width=16) (actual time=0.006..0.006 rows=0 loops=1)
> Filter: (((last_contacted_anychannel_dttm IS NULL) OR (last_contacted_anychannel_dttm < '2020-11-23 00:00:00'::timestamp without time zone)) AND (shared_paddr_with_customer_ind = 'N'::bpchar) AND (profane_wrd_ind = 'N'::bpchar) AND (tmo_ofnsv_name_ind = 'N'::bpchar) AND (has_individual_address = 'Y'::bpchar) AND (has_last_name = 'Y'::bpchar) AND (has_first_name = 'Y'::bpchar))
> -> Seq Scan on individual_demographic b (cost=0.00..10.53 rows=1 width=8) (never executed)
> Filter: ((tax_bnkrpt_dcsd_ind = 'N'::bpchar) AND (govt_prison_ind = 'N'::bpchar) AND ((cstmr_prspct_ind)::text = 'Prospect'::text))
> -> Seq Scan on household_demographic c (cost=0.00..10.14 rows=3 width=8) (never executed)
> Filter: (((hspnc_lang_prfrnc_cval)::text = ANY ('{B,E,X}'::text[])) OR (hspnc_lang_prfrnc_cval IS NULL))
> Planning Time: 1.384 ms
> Execution Time: 0.206 ms
> 13 rows

It's doing nested loops with estimated rowcount=1, which indicates a bad
underestimate, and suggests that the conditions are redundant or correlated.

Maybe you can handle this with MV stats on the correlated columns:

CREATE STATISTICS prospect_stats (dependencies) ON
shared_paddr_with_customer_ind, profane_wrd_ind, tmo_ofnsv_name_ind, has_individual_address, has_last_name, has_first_name
FROM prospect;
CREATE STATISTICS individual_demographic_stats (dependencies) ON
tax_bnkrpt_dcsd_ind, govt_prison_ind, cstmr_prspct_ind
FROM individual_demographic_stats
ANALYZE prospect, individual_demographic_stats ;

Since it's expensive to compute stats on large number of columns, I'd then
check *which* are correlated and then only compute MV stats on those. This
will show col1=>col2: X where X approaches 1, the conditions are highly
correlated:
SELECT * FROM pg_statistic_ext; -- pg_statistic_ext_data since v12

Also, as a diagnostic tool to get "explain analyze" to finish, you can
SET enable_nestloop=off;

--
Justin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tomas Vondra 2021-02-14 22:03:50 Re: Query performance issue
Previous Message Nagaraj Raj 2021-01-22 01:53:26 Query performance issue