Re: BUG #14107: Major query planner bug regarding subqueries and indices

From: Mathias Kunter <mathiaskunter(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14107: Major query planner bug regarding subqueries and indices
Date: 2016-04-22 14:02:49
Message-ID: 69e376c7-6c2c-c96d-2ad6-3734affb59b7@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> Thanks for the report, but I think your subject line is a little
> exaggerated.

Sorry for that :-)

> I'd not actually thought about expanding this to IN and NOT IN, but
> likely it would be possible, providing NULLs could be handled
> correctly too. Was this the optimisation you think is missing?

No - let's consider a real-world example where the problem becomes more
evident. Assume we have the following two tables:

CREATE TABLE book (id SERIAL NOT NULL, name VARCHAR, authorId SERIAL,
CONSTRAINT book_pkey PRIMARY KEY (id));

CREATE TABLE author (id SERIAL NOT NULL, surname VARCHAR, CONSTRAINT
author_pkey PRIMARY KEY (id));

All relevant columns are indexed. The book table contains 1.3 M rows.

Now let's say we want to find books by either name or by author name. A
simple problem, actually. The following query is used (which could of
course also be rewritten to use a JOIN instead, but that's not the point
here). We find out that it has an unexpected and unacceptably long
execution time:

EXPLAIN SELECT * FROM book WHERE name = 'Harry Potter' OR authorId IN
(SELECT id FROM author WHERE surname = 'Rowling');
QUERY PLAN
-------------------------------------------------------------------------------------------
Seq Scan on book (cost=13.68..26792.88 rows=576709 width=40)
Filter: (((name)::text = 'Harry Potter'::text) OR (hashed SubPlan 1))
SubPlan 1
-> Bitmap Heap Scan on author (cost=4.20..13.67 rows=6 width=4)
Recheck Cond: ((surname)::text = 'Rowling'::text)
-> Bitmap Index Scan on author_surname_index
(cost=0.00..4.20 rows=6 width=0)
Index Cond: ((surname)::text = 'Rowling'::text)

In contrast to that, consider the query plan for the following logically
equivalent query, which executes in just a few milliseconds (!):

EXPLAIN SELECT * FROM book WHERE name = 'Harry Potter' UNION SELECT *
FROM book WHERE authorId IN (SELECT id FROM author WHERE surname =
'Rowling');
QUERY PLAN
------------------------------------------------------------------------------------------------------------
HashAggregate (cost=454.87..455.99 rows=112 width=29)
Group Key: book.id, book.name, book.authorid
-> Append (cost=0.43..454.03 rows=112 width=29)
-> Index Scan using book_name_index on book
(cost=0.43..16.48 rows=3 width=29)
Index Cond: ((name)::text = 'Harry Potter'::text)
-> Nested Loop (cost=4.63..436.43 rows=109 width=29)
-> Bitmap Heap Scan on author (cost=4.20..13.67 rows=6
width=4)
Recheck Cond: ((surname)::text = 'Rowling'::text)
-> Bitmap Index Scan on author_surname_index
(cost=0.00..4.20 rows=6 width=0)
Index Cond: ((surname)::text = 'Rowling'::text)
-> Index Scan using book_authorid_index on book book_1
(cost=0.43..70.28 rows=18 width=29)
Index Cond: (authorid = author.id)

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Mathias Kunter 2016-04-22 14:08:54 Re: BUG #14107: Major query planner bug regarding subqueries and indices
Previous Message David Rowley 2016-04-22 08:56:49 Re: BUG #14107: Major query planner bug regarding subqueries and indices