Performance Issue -- "Materialize"

From: anand086 <anand086(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Performance Issue -- "Materialize"
Date: 2017-08-19 17:37:56
Message-ID: 1503164276875-5979128.post@n3.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I am a Postgres Newbie and trying to learn :)We have a scenario wherein, one
of the SQL with different input value for import_num showing different
execution plan.As an example, with import_num = '4520440' the execution plan
shows Nested Loop and is taking ~12secs. With import_num = '4520460'
execution plan showed using "Materialize" and never completed. After I set
enable_material to off, the execution plan is changed using Hash Semi Join
and completes in less than 3 secs. SELECT count(*) FROM test_tab WHERE login
IN (SELECT DISTINCT login FROM test_tab WHERE import_num = '4520440' AND
login IS NOT NULL EXCEPT SELECT DISTINCT login FROM test_tab WHERE
import_num = '0' AND login IS NOT NULL) AND import_num =
'4520440';+--------+| count |+--------+| 746982 |+--------+(1 row)Time:
12054.274 ms
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+|
QUERY PLAN
|+-----------------------------------------------------------------------------------------------------------------------------------------------------------+|
Aggregate (cost=351405.08..351405.09 rows=1 width=8)
|| -> Nested Loop (cost=349846.23..350366.17 rows=415562 width=0)
|| -> HashAggregate (cost=349845.67..349847.67 rows=200 width=96)
|| Group Key: ("ANY_subquery".login)::text
|| -> Subquery Scan on "ANY_subquery"
(cost=340828.23..348557.47 rows=515282 width=96)
|| -> SetOp Except (cost=340828.23..343404.65
rows=515282 width=100)
|| -> Sort (cost=340828.23..342116.44
rows=515283 width=100)
|| Sort Key: "*SELECT* 1".login
|| -> Append (cost=0.56..275836.74
rows=515283 width=100)
|| -> Subquery Scan on "*SELECT* 1"
(cost=0.56..275834.70 rows=515282 width=12)
|| -> Unique
(cost=0.56..270681.88 rows=515282 width=8)
|| -> Index Only Scan
using ui_nkey_test_tab on test_tab test_tab_1 (cost=0.56..268604.07
rows=831125 width=8) ||
Index Cond: ((import_num = '4520440'::numeric) AND (login IS NOT NULL))
|| -> Subquery Scan on "*SELECT* 2"
(cost=0.56..2.04 rows=1 width=12)
|| -> Unique (cost=0.56..2.03
rows=1 width=8)
|| -> Index Only Scan
using ui_nkey_test_tab on test_tab test_tab_2 (cost=0.56..2.03 rows=1
width=8) ||
Index Cond: ((import_num = '0'::numeric) AND (login IS NOT NULL))
|| -> Index Only Scan using ui_nkey_test_tab on test_tab
(cost=0.56..2.58 rows=1 width=8)
|| Index Cond: ((import_num = '4520440'::numeric) AND (login =
("ANY_subquery".login)::text))
|+-----------------------------------------------------------------------------------------------------------------------------------------------------------+(19
rows)
SELECT count(*) FROM test_tab WHERE import_num = '4520460' and login IN
(SELECT DISTINCT login FROM test_tab WHERE import_num = '4520460' AND login
IS NOT NULL EXCEPT SELECT DISTINCT login FROM test_tab WHERE import_num =
'0' AND login IS NOT NULL);The SQL was never completing and had the below
SQL execution plan --
+-------------------------------------------------------------------------------------------------------------------------------------------+|
QUERY PLAN
|+-------------------------------------------------------------------------------------------------------------------------------------------+|
Aggregate (cost=6.14..6.15 rows=1 width=8)
|| -> Nested Loop Semi Join (cost=1.12..6.13 rows=1 width=0)
|| Join Filter: ((test_tab.login)::text =
("ANY_subquery".login)::text)
|| -> Index Only Scan using ui_nkey_test_tab on test_tab
(cost=0.56..2.02 rows=1 width=8)
|| Index Cond: (import_num = '4520460'::numeric)
|| -> Materialize (cost=0.56..4.10 rows=1 width=96)
|| -> Subquery Scan on "ANY_subquery" (cost=0.56..4.09
rows=1 width=96) ||
-> HashSetOp Except (cost=0.56..4.08 rows=1 width=100)
|| -> Append (cost=0.56..4.08 rows=2 width=100)
|| -> Subquery Scan on "*SELECT* 1"
(cost=0.56..2.04 rows=1 width=12) ||
-> Unique (cost=0.56..2.03 rows=1 width=8)
|| -> Index Only Scan using
ui_nkey_test_tab on test_tab test_tab_1 (cost=0.56..2.03 rows=1 width=8) ||
Index Cond: ((import_num = '4520460'::numeric) AND (login IS NOT NULL))
|| -> Subquery Scan on "*SELECT* 2"
(cost=0.56..2.04 rows=1 width=12) ||
-> Unique (cost=0.56..2.03 rows=1 width=8)
|| -> Index Only Scan using
ui_nkey_test_tab on test_tab test_tab_2 (cost=0.56..2.03 rows=1 width=8) ||
Index Cond: ((import_num = '0'::numeric) AND (login IS NOT NULL))
|+-------------------------------------------------------------------------------------------------------------------------------------------+(17
rows)
############################################## After I set enable_material
to off;#############################################SELECT count(*) FROM
test_tab WHERE import_num = '4520460' and login IN (SELECT DISTINCT login
FROM test_tab WHERE import_num = '4520460' AND login IS NOT NULL EXCEPT
SELECT DISTINCT login FROM test_tab WHERE import_num = '0' AND login IS NOT
NULL);+--------+| count |+--------+| 762599 |+--------+(1 row)Time:
2116.889 ms
+-------------------------------------------------------------------------------------------------------------------------------------------+|
QUERY PLAN
|+-------------------------------------------------------------------------------------------------------------------------------------------+|
Aggregate (cost=6.13..6.14 rows=1 width=8)
|| -> Hash Semi Join (cost=4.67..6.13 rows=1 width=0)
|| Hash Cond: ((test_tab.login)::text =
("ANY_subquery".login)::text)
|| -> Index Only Scan using ui_nkey_test_tab on test_tab
(cost=0.56..2.02 rows=1 width=8)
|| Index Cond: (import_num = '4520460'::numeric)
|| -> Hash (cost=4.09..4.09 rows=1 width=96)
|| -> Subquery Scan on "ANY_subquery" (cost=0.56..4.09
rows=1 width=96) ||
-> HashSetOp Except (cost=0.56..4.08 rows=1 width=100)
|| -> Append (cost=0.56..4.08 rows=2 width=100)
|| -> Subquery Scan on "*SELECT* 1"
(cost=0.56..2.04 rows=1 width=12) ||
-> Unique (cost=0.56..2.03 rows=1 width=8)
|| -> Index Only Scan using
ui_nkey_test_tab on test_tab test_tab_1 (cost=0.56..2.03 rows=1 width=8) ||
Index Cond: ((import_num = '4520460'::numeric) AND (login IS NOT NULL))
|| -> Subquery Scan on "*SELECT* 2"
(cost=0.56..2.04 rows=1 width=12) ||
-> Unique (cost=0.56..2.03 rows=1 width=8)
|| -> Index Only Scan using
ui_nkey_test_tab on test_tab test_tab_2 (cost=0.56..2.03 rows=1 width=8) ||
Index Cond: ((import_num = '0'::numeric) AND (login IS NOT NULL))
|+-------------------------------------------------------------------------------------------------------------------------------------------+(17
rows)
Looking at the row count for import_numselect import_num, count(*) from
test_tab group by import_num order by 2;+------------+--------+| import_num
| count |+------------+--------+| 4520440 | 746982 || 4520460 |
762599 |+------------+--------+(37 rows)With different value of import_num
we are having different execution plan. Is there a way to force the same
Hash semi Join plan to sql with import_num 4520440, currently doing nested
loop.I tried /*+HashJoin(a1 ANY_subquery)*/ but the sql execution plan
doesn't change.SELECT /*+HashJoin(a1 ANY_subquery)*/ count(*) FROM test_tab
a1 WHERE import_num = '4520440' and login IN (SELECT DISTINCT login FROM
test_tab a2 WHERE import_num = '4520440' AND login IS NOT NULL EXCEPT
SELECT DISTINCT login FROM test_tab a3 WHERE import_num = '0' AND login IS
NOT NULL);Regards,Anand

--
View this message in context: http://www.postgresql-archive.org/Performance-Issue-Materialize-tp5979128.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mariel Cherkassky 2017-08-20 06:39:45 Re: performance problem on big tables
Previous Message Mark Kirkwood 2017-08-19 06:51:34 Re: Very poor read performance, query independent