Select queries which violates table constrains

From: Joni Martikainen <joni(at)shade-fx(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Select queries which violates table constrains
Date: 2014-05-10 18:24:21
Message-ID: 536E6ED5.6050409@shade-fx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I investigated some select query performance issues and noticed that
postgresql misses some obvious cases while processing SELECT query. I
mean the case where WHERE clause contains statement which condition
would be against table structure. (excuse my language, look the code)

Example:
Let the table be :

CREATE TABLE test
(
id numeric(3,0) NOT NULL,
somecolumn numeric(5,0) NOT NULL,
CONSTRAINT id_pk PRIMARY KEY (id)
);

Simple table with "somecolumn" column which has constraint NOT NULL.

Let's do a following query to the table.

SELECT somecolumn FROM test WHERE somecolumn IS NULL;

Result is empty result set which is obvious because any null value would
be against the table constrain.
The thing here is that postgresql does SeqScan to this table in order to
find out if there is any null values.

Explain:
"Seq Scan on test (cost=0.00..1.06 rows=1 width=5)"
" Filter: (somecolumn IS NULL)"
"Planning time: 0.778 ms"

SeqScan can be avoided by making index for "somecolumn" and indexing all
the null values. That index would be empty and very fast but also very
pointless since table constraint here is simple.
No one would do such a query in real life but some programmatically
generated queries does this kind of things. Only way I found to go
around this problem was to create those empty indexies but I think the
query optimizer could be smarter here.

I took a look of the optimizer code and I didn't find any code which
avoids this kind of situations. (I expect that it would be optimizer's
task to find out this kind of things)

I was thinking some feature for optimizer where the optimizer could add
a hint for an executor if some query plan path leads to the empty result
set case. If executor sees this hint it could avoid doing seqscan and
actually even index scans. This kind of query constraint vs. table
constraint comparison should be anyway cheaper process to execute than
seqscan.

The question is that, is there any reason why such an optimization phase
could not be implemented? Another question is that how is the query
engine handling the partitioned table case? Am i right that table
partitions are solved by table constrains and indexies are used to
validate which child table to look for? And so forth could this kind of
new optimization phase benefit partitioned tables?

Kind regards
Joni Martikainen

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2014-05-10 19:41:25 Lossy bitmap scan is broken in GIN
Previous Message Tomas Vondra 2014-05-10 18:21:55 Re: Sending out a request for more buildfarm animals?