Table with many NULLS for indexed column yields strange query plan

From: greigwise <greigwise(at)comcast(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Table with many NULLS for indexed column yields strange query plan
Date: 2020-03-06 01:08:53
Message-ID: 1583456933425-0.post@n3.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I have a query like this:

SELECT "table1".* FROM "table1"
INNER JOIN "table2" ON "table2"."table1_id" = "table1"."id"
INNER JOIN "table3" ON "table3"."id" = "table2"."table3_id" WHERE
"table3"."number" = 'xxxx'
AND ("table2"."type") IN ('Standard') ;

table2 has a large number of NULLS in the column table3_id. There is an
index on this column. Here is the result of explain analyze:

Merge Join (cost=1001.20..4076.67 rows=17278 width=167) (actual
time=284.918..300.167 rows=2244 loops=1)
Merge Cond: (table2.table3_id = table3.id)
-> Gather Merge (cost=1000.93..787825.78 rows=621995 width=175) (actual
time=5.786..283.269 rows=64397 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Nested Loop (cost=0.87..712740.12 rows=155499 width=175)
(actual time=0.091..102.708 rows=13107 loops=5)
-> Parallel Index Scan using index_table2_on_table3_id on
table2 (cost=0.43..489653.08 rows=155499 width=16) (actual
time=0.027..22.327 rows=13107 loops=5)
Filter: ((type)::text = 'Standard'::text)
-> Index Scan using table1_pk on table1 (cost=0.44..1.43
rows=1 width=167) (actual time=0.005..0.005 rows=1 loops=65535)
Index Cond: (id = table2.table1_id)
-> Index Scan using table3_pkey on table3 (cost=0.27..53.40 rows=1
width=8) (actual time=0.041..0.048 rows=1 loops=1)
Filter: ((number)::text = 'xxxx'::text)
Rows Removed by Filter: 35
Planning time: 0.450 ms
Execution time: 310.230 ms

You can see the row estimate there is way off on the Parallel Index Scan.
I suspect that this is because it's including the rows with null in the
selectivity estimate even though the table3_id can't possibly be null here
due to the inner join.

If I modify the query like this:

SELECT "table1".* FROM "table1"
INNER JOIN "table2" ON "table2"."table1_id" = "table1"."id"
INNER JOIN "table3" ON "table3"."id" = "table2"."table3_id" WHERE
"table3"."number" = 'xxxx'
AND ("table2"."type") IN ('Standard') and table3_id is not null;

Just adding in table3_id is not null at the end there, I get a much better
plan.

Nested Loop (cost=1.14..290.04 rows=66 width=167) (actual
time=0.058..11.258 rows=2244 loops=1)
-> Nested Loop (cost=0.70..64.46 rows=66 width=8) (actual
time=0.049..2.873 rows=2244 loops=1)
-> Index Scan using table3_pkey on table3 (cost=0.27..53.40
rows=1 width=8) (actual time=0.030..0.035 rows=1 loops=1)
Filter: ((number)::text = 'xxxx'::text)
Rows Removed by Filter: 35
-> Index Scan using index_table2_on_table3_id on table2
(cost=0.43..11.05 rows=1 width=16) (actual time=0.017..2.102 rows=2244
loops=1)
Index Cond: ((table3_id = table3.id) AND (table3_id IS NOT
NULL))
Filter: ((type)::text = 'Standard'::text)
-> Index Scan using table1_pk on table1 (cost=0.44..3.42 rows=1
width=167) (actual time=0.003..0.003 rows=1 loops=2244)
Index Cond: (id = table2. id)
Planning time: 0.403 ms
Execution time: 11.672 ms

Can I do anything statistics wise so that I get a better plan here or do I
have to modify the query. It seems kinda hacky that I would have to specify
is not null on that column since like I said it can't possibly be null.

Thanks,

Greig Wise

--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mark Haylock 2020-03-06 02:39:29 What do null column values for pg_stat_progress_vacuum mean?
Previous Message Rory Campbell-Lange 2020-03-05 20:52:38 Re: Advice request : simultaneous function/data updates on many databases