BUG #8571: Planner miss-estimates '<null_column> is not true' as not matching any rows

From: tgarnett(at)panjiva(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #8571: Planner miss-estimates '<null_column> is not true' as not matching any rows
Date: 2013-10-31 06:17:11
Message-ID: E1VblZ9-00044Z-5J@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 8571
Logged by: Timothy Garnett
Email address: tgarnett(at)panjiva(dot)com
PostgreSQL version: 9.2.4
Operating system: Ubuntu Linux x86_64 12.04.3 LTS
Description:

The planner seems to be badly miss-estimating the selectivity of '<always
null col> IS NOT true/false', it does not seem to do so for equivalent
expressions such as '<col> is null or not <col>' or expressions where the
selectivity is difficult to determine.

Simplified scenario:

create temporary table test_tbl as (select generate_series as id,
null::boolean as val from generate_series(1,1000));
analyze verbose test_tbl;

select count(*) from test_tbl where val is not true;
=> 1000

explain select * from test_tbl where val is null; -- correctly estimates
1000 rows
=> Seq Scan on test_tbl (cost=0.00..15.00 rows=1000 width=5)
Filter: (val IS NULL)

-- problem case --
explain select * from test_tbl where val is not true; -- estimates only 1
row!
=> Seq Scan on test_tbl (cost=0.00..15.00 rows=1 width=5)
Filter: (val IS NOT TRUE)

explain select * from test_tbl where NOT coalesce(val, false); -- estimates
500 rows (actual 1,000) but will still generally result in a reasonable
plan
=> Seq Scan on test_tbl (cost=0.00..15.00 rows=500 width=5)
Filter: (NOT COALESCE(val, false))

explain select * from test_tbl where val is null or not val; -- correctly
estimates 1000 rows
=> Seq Scan on test_tbl (cost=0.00..15.00 rows=1000 width=5)
Filter: ((val IS NULL) OR (NOT val))

Estimating only one row for the selectivity of a where clause with
additional joins involved when many rows match can result in some very bad
plans which is what we ran into.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Frank van Vugt 2013-10-31 07:08:52 Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion
Previous Message Jeevan Chalke 2013-10-31 05:20:58 Re: surprising to_timestamp behavior