From: | greigwise <greigwise(at)comcast(dot)net> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Table with many NULLS for indexed column yields strange query plan |
Date: | 2020-03-06 01:08:53 |
Message-ID: | 1583456933425-0.post@n3.nabble.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I have a query like this:
SELECT "table1".* FROM "table1"
INNER JOIN "table2" ON "table2"."table1_id" = "table1"."id"
INNER JOIN "table3" ON "table3"."id" = "table2"."table3_id" WHERE
"table3"."number" = 'xxxx'
AND ("table2"."type") IN ('Standard') ;
table2 has a large number of NULLS in the column table3_id. There is an
index on this column. Here is the result of explain analyze:
Merge Join (cost=1001.20..4076.67 rows=17278 width=167) (actual
time=284.918..300.167 rows=2244 loops=1)
Merge Cond: (table2.table3_id = table3.id)
-> Gather Merge (cost=1000.93..787825.78 rows=621995 width=175) (actual
time=5.786..283.269 rows=64397 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Nested Loop (cost=0.87..712740.12 rows=155499 width=175)
(actual time=0.091..102.708 rows=13107 loops=5)
-> Parallel Index Scan using index_table2_on_table3_id on
table2 (cost=0.43..489653.08 rows=155499 width=16) (actual
time=0.027..22.327 rows=13107 loops=5)
Filter: ((type)::text = 'Standard'::text)
-> Index Scan using table1_pk on table1 (cost=0.44..1.43
rows=1 width=167) (actual time=0.005..0.005 rows=1 loops=65535)
Index Cond: (id = table2.table1_id)
-> Index Scan using table3_pkey on table3 (cost=0.27..53.40 rows=1
width=8) (actual time=0.041..0.048 rows=1 loops=1)
Filter: ((number)::text = 'xxxx'::text)
Rows Removed by Filter: 35
Planning time: 0.450 ms
Execution time: 310.230 ms
You can see the row estimate there is way off on the Parallel Index Scan.
I suspect that this is because it's including the rows with null in the
selectivity estimate even though the table3_id can't possibly be null here
due to the inner join.
If I modify the query like this:
SELECT "table1".* FROM "table1"
INNER JOIN "table2" ON "table2"."table1_id" = "table1"."id"
INNER JOIN "table3" ON "table3"."id" = "table2"."table3_id" WHERE
"table3"."number" = 'xxxx'
AND ("table2"."type") IN ('Standard') and table3_id is not null;
Just adding in table3_id is not null at the end there, I get a much better
plan.
Nested Loop (cost=1.14..290.04 rows=66 width=167) (actual
time=0.058..11.258 rows=2244 loops=1)
-> Nested Loop (cost=0.70..64.46 rows=66 width=8) (actual
time=0.049..2.873 rows=2244 loops=1)
-> Index Scan using table3_pkey on table3 (cost=0.27..53.40
rows=1 width=8) (actual time=0.030..0.035 rows=1 loops=1)
Filter: ((number)::text = 'xxxx'::text)
Rows Removed by Filter: 35
-> Index Scan using index_table2_on_table3_id on table2
(cost=0.43..11.05 rows=1 width=16) (actual time=0.017..2.102 rows=2244
loops=1)
Index Cond: ((table3_id = table3.id) AND (table3_id IS NOT
NULL))
Filter: ((type)::text = 'Standard'::text)
-> Index Scan using table1_pk on table1 (cost=0.44..3.42 rows=1
width=167) (actual time=0.003..0.003 rows=1 loops=2244)
Index Cond: (id = table2. id)
Planning time: 0.403 ms
Execution time: 11.672 ms
Can I do anything statistics wise so that I get a better plan here or do I
have to modify the query. It seems kinda hacky that I would have to specify
is not null on that column since like I said it can't possibly be null.
Thanks,
Greig Wise
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
From | Date | Subject | |
---|---|---|---|
Next Message | Mark Haylock | 2020-03-06 02:39:29 | What do null column values for pg_stat_progress_vacuum mean? |
Previous Message | Rory Campbell-Lange | 2020-03-05 20:52:38 | Re: Advice request : simultaneous function/data updates on many databases |