NOT IN subquery optimization

From: Jim Finnerty <jfinnert(at)amazon(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: NOT IN subquery optimization
Date: 2019-02-20 23:44:49
Message-ID: 1550706289606-0.post@n3.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The semantics of NOT IN (SELECT ...) are subtly different from the semantics
of NOT EXISTS (SELECT ...). These differences center on how NULLs are
treated, and in general can result in statements that are harder to optimize
and slower to execute than the apparently similar NOT EXISTS statement.

A little over a year ago, Christian Antognini authored the blog "/How Well a
Query Optimizer Handles Subqueries?/" summarizing his findings about the
performance of PostgreSQL, MySQL, and Oracle on various subqueries:


https://antognini.ch/2017/12/how-well-a-query-optimizer-handles-subqueries/

His position was that you can classify the optimizations as correct or
incorrect, and based on that he provided the following comparison summary
(see below). In short, PostgreSQL was the worst of the three systems:

"Summary

The number of queries that the query optimizers handle correctly are
the following:

Oracle Database 12.2: 72 out of 80
MySQL 8.0.3: 67 out of 80
PostgreSQL 10.0: 60 out of 80

Since not all queries are handled correctly, for best performance it is
sometimes necessary to rewrite them."

The subqueries that were found to be optimized "incorrectly" were almost
entirely due to poor or absent NOT IN subquery optimization.

The PostgreSQL community has been aware of the deficiencies in NOT IN
optimization for quite some time. Based on an analysis of
psgsql-performance posts between 2013 and 2015, Robert Haas identified NOT
IN optimization as one of the common root causes of performance problems.

We have been working on improved optimization of NOT IN, and we would like
to share this optimizaton with the community. With respect to the test
cases mentioned in the blog post mentioned above, it will elevate PostgreSQL
from "worst" to "first". Generally the performance gains are large when the
optimization applies, though we have found one test case where performance
is worse. We are investigating this now to see if we can disable the
optimization in that case.

We would like to include a patch for this change in the current commitfest.
This thread can be used to track comments about this optimization.

-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2019-02-20 23:46:09 Re: WAL insert delay settings
Previous Message Tomas Vondra 2019-02-20 23:35:24 Re: WAL insert delay settings