From: | "Li, Zheng" <zhelli(at)amazon(dot)com> |
---|---|
To: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | David Steele <david(at)pgmasters(dot)net>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: NOT IN subquery optimization |
Date: | 2019-03-16 21:07:04 |
Message-ID: | 184485F3-7103-476D-BE4A-55C865603DCF@amazon.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hey, here is our latest patch. Major changes in this patch include:
1. Use the original hashed subplan if the inner fits in memory as decided by subplan_is_hashable().
2. Fixed the inner relation empty case by adding an inner relation existence check when we pull x out as a filter on the outer (see details below).
3. Integrate David Rowley's routine to use strict predicates and inner join conditions when checking nullability of a Var.
Detailed description of the patch:
NOT IN to ANTI JOIN transformation
If the NOT IN subquery is not eligible for hashed subplan as decided by
subplan_is_hashable(), do the following NOT IN to ANTI JOIN
transformation:
Single expression:
When x is nullable:
t1.x not in (t2.y where p) =>
ANTI JOIN
t1 (Filter: t1.x IS NOT NULL or NOT EXISTS (select 1 from t2 where p)),
t2 on join condition (t1.x=t2.y or t2.y IS NULL)
and p.
The predicate "t2.y IS NULL" can be removed if y is non-nullable.
When x is non-nullable:
t1.x not in (t2.y where p) =>
ANTI JOIN t1, t2 on join condition (t1.x=t2.y or t2.y IS NULL)
and p.
The predicate "t2.y IS NULL" can be removed if y is non-nullable.
Multi expression:
If all xi's are nullable:
(x1, x2, ... xn) not in (y1, y2, ... yn ) =>
ANTI JOIN t1, t2 on join condition:
((t1.x1 = t2.y1) and ... (t1.xi = t2.yi) ... and
(t1.xn = t2.yn)) IS NOT FALSE.
If at least one xi is non-nuallable:
(x1, x2, ... xn) not in (y1, y2, ... yn ) =>
ANTI JOIN t1, t2 on join condition:
(t1.x1 = t2.y1 or t2.y1 IS NULL or t1.x1 IS NULL) and ...
(t1.xi = t2.yi or t2.yi IS NULL) ... and
(t1.xn = t2.yn or t2.yn IS NULL or t1.xn IS NULL).
Add nullability testing routine is_node_nonnullable(), currently it
handles Var, TargetEntry, CoalesceExpr and Const. It uses strict
predicates, inner join conditions and NOT NULL constraint to check
the nullability of a Var.
Adjust and apply reduce_outer_joins() before the transformation so
that the outer joins have an opportunity to be converted to inner joins
prior to the transformation.
We measured performance improvements of two to five orders of magnitude
on most queries in a development environment. In our performance experiments,
table s (small) has 11 rows, table l (large) has 1 million rows. s.n and l.n
have NULL value. s.nn and l.nn are NOT NULL. Index is created on each column.
Cases using hash anti join:
l.nn not in (l.nn) 21900s -> 235ms
l.nn not in (l.nn where u>0) 22000s -> 240ms
l.n not in (l.nn) 21900s -> 238ms
l.n not in (l.nn where u>0) 22000s -> 248ms
Cases using index nested loop anti join
s.n not in (l.nn) 360ms -> 0.5ms
s.n not in (l.nn where u>0) 390ms -> 0.6ms
s.nn not in (l.nn) 365ms -> 0.5ms
s.nn not in (l.nn where u>0) 390ms -> 0.5ms
Cases using bitmap heap scan on the inner and nested loop anti join:
s.n not in (l.n) 360ms -> 0.7ms
l.n not in (l.n) 21900s -> 1.6s
l.n not in (l.n where u>0) 22000s -> 1680ms
s.nn not in (l.n) 360ms -> 0.5ms
l.nn not in (l.n) 21900s -> 1650ms
l.nn not in (l.n where u>0) 22000s -> 1660ms
Cases using the original hashed subplan:
l.n not in (s.n) 63ms -> 63ms
l.nn not in (s.nn) 63ms -> 63ms
l.n not in (s.n where u>0) 63ms -> 63ms
Comments are welcome.
Regards,
-----------
Zheng Li
AWS, Amazon Aurora PostgreSQL
Attachment | Content-Type | Size |
---|---|---|
not_in_to_anti_join_transformation_v2.patch | application/octet-stream | 137.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2019-03-16 21:07:28 | Re: Making all nbtree entries unique by having heap TIDs participate in comparisons |
Previous Message | Peter Geoghegan | 2019-03-16 21:01:48 | Re: Making all nbtree entries unique by having heap TIDs participate in comparisons |