From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Allowing NOT IN to use ANTI joins |
Date: | 2014-07-15 11:58:30 |
Message-ID: | CAApHDvqRB-iFBy68=dCgqS46aRep7AuN2pou4KTwL8kX9YOcTQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jul 14, 2014 at 8:55 PM, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> Since the patch has not been marked as rejected I was thinking that I'd
> take a bash at fixing it up, but if you think this is a waste of time,
> please let me know.
>
>
>
I've made some changes to the patch so that it only allows the conversion
to ANTI JOIN to take place if both the outer query's expressions AND the
subquery's target list can be proved not to have NULLs.
I've attached a delta, which is the changes I've made on top of Tom's
cleaned up version of my patch, and also a full patch.
I've also performed some benchmarks to try to determine how much time it
takes to execute this null checking code. I ended up hacking the code a
little for the benchmarks and just put the null checking function in a
tight loop that performed 100000 iterations.
Like:
if (under_not)
{
int x;
bool result;
for (x = 0; x < 100000; x++)
{
result = is_NOTANY_compatible_with_antijoin(parse, sublink);
}
if (!result)
return NULL;
}
I then ran 6 queries, 3 times each through the planner and grabbed the
"Planning Time" from the explain analyze result.
I then removed the extra looping code (seen above) and compiled the code as
it is with the attached patch.
I then ran each of the 6 queries again 3 times each and noted down the
"Planning Time from the explain analyze result.
In my results I assumed that the first set of times divided by 100000
would be the time taken to perform the NULL checks... This is not quite
accurate, but all the other planning work was quite well drowned out by the
100k loop.
I found that the call to is_NOTANY_compatible_with_antijoin adds about 0.2%
and 2.3% to total planning time. Though the 2.3% was quite an extreme case,
and the 0.2% was the most simple case I could think of.
I've attached the complete results in html format. I've also attached the
schema that I used and all 6 queries tested.
Here's 2 points which I think are important to note about the planning time
overhead of this patch:
1. There is no additional overhead if the query has no NOT IN clause.
2. The test queries 3 and 6 were to benchmark the overhead of when the NOT
NULL test fails. The slowest of these was test 3 which added just under
0.5% to the planning time. The query that added a 2.3% overhead performed
an ANTI JOIN, so likely the reduction in execution time more than made up
for the extra planning time.
Regards
David Rowley
Attachment | Content-Type | Size |
---|---|---|
not_in_benchmark_schema.sql | text/plain | 1.7 KB |
NOTIN_Planner_Benchmark.htm | text/html | 10.5 KB |
not_in_anti_join_v0.9.delta.patch | application/octet-stream | 14.5 KB |
not_in_anti_join_v0.9.patch | application/octet-stream | 37.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2014-07-15 12:14:27 | Re: Removing dependency to wsock32.lib when compiling code on WIndows |
Previous Message | Magnus Hagander | 2014-07-15 11:46:27 | Re: Removing dependency to wsock32.lib when compiling code on WIndows |