From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Jim Nasby <jim(at)nasby(dot)net> |
Subject: | Re: Patch to support SEMI and ANTI join removal |
Date: | 2014-10-08 11:21:44 |
Message-ID: | CAApHDvo+uZuPUSRFmJYMNEe5=70eEro5=HuX=tciZRfTRJ5C1A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Oct 7, 2014 at 3:46 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Oct 6, 2014 at 5:57 AM, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > Can anyone shed any light on how I might determine where the scan rel is
> in
> > the tree? I need to find it so I can check if the RangeTblEntry is
> marked as
> > skip-able.
>
> I think you're probably going to need logic that knows about
> particular node types and does the right thing for each one. I think
> - but maybe I'm misunderstanding - what you're looking for is a
> function of the form Oid ScansOnePlainTableWithoutQuals(). The
> algorithm could be something like:
>
> switch (node type)
> {
> case T_SeqScanState:
> if (no quals)
> return the_appropriate_oid;
> return false;
> case T_HashJoin:
> decide whether we can ignore one side of the join
> fish out the node from the other side of the join (including
> reaching through the Hash node if necessary)
> return ScansOnePlainTableWithoutQuals(the node we fished out);
> ...other specific cases...
> default:
> return false;
> }
>
Thanks Robert.
Ok, so I've been hacking away at this for a couple of evenings and I think
I have a working prototype finally!
My earlier thoughts about having to descend down until I find a seqscan
were wrong. It looks like just need to look at the next node down, if it's
a seqscan and it's marked as not needed, then we can skip that side of the
join, or if the child node is a HashJoinState then check the skip status of
that node, if both sides are marked as not needed, then skip that side of
the join.
I've just completed some simple performance tests:
create table t3 (id int primary key);
create table t2 (id int primary key, t3_id int not null references t3);
create table t1 (id int primary key, t2_id int not null references t2);
I've loaded these tables with 4 million rows each.The performance is as
follows:
test=# select count(*) from t1 inner join t2 on t1.t2_id=t2.id inner join
t3 on t2.t3_id=t3.id;
count
---------
4000000
(1 row)
Time: 1022.492 ms
test=# select count(*) from t1;
count
---------
4000000
(1 row)
Time: 693.642 ms
test=# alter table t2 drop constraint t2_t3_id_fkey;
ALTER TABLE
Time: 2.141 ms
test=# alter table t1 drop constraint t1_t2_id_fkey;
ALTER TABLE
Time: 1.678 ms
test=# select count(*) from t1 inner join t2 on t1.t2_id=t2.id inner join
t3 on t2.t3_id=t3.id;
count
---------
4000000
(1 row)
Time: 11682.525 ms
So it seems it's not quite as efficient as join removal at planning time,
but still a big win when it's possible to perform the join skipping.
As yet, I've only added support for hash joins, and I've not really looked
into detail on what's needed for nested loop joins or merge joins.
For hash join I just added some code into the case HJ_BUILD_HASHTABLE: in
ExecHashJoin(). The code just checks if any side can be skipped, if they
can then the node will never move out of the HJ_BUILD_HASHTABLE state, so
that each time ExecHashJoin() is called, it'll just return the next tuple
from the non-skip side of the join until we run out of tuples... Or if both
sides can be skipped NULL is returned as any nodes above this shouldn't
attempt to scan, perhaps this should just be an Assert() as I don't think
any parent nodes should ever bother executing that node if it's not
required. The fact that I've put this code into the switch
under HJ_BUILD_HASHTABLE makes me think it should add about close to zero
overhead for when the join cannot be skipped.
One thing that we've lost out of this execution time join removal checks
method is the ability to still remove joins where the join column is
NULLable. The previous patch added IS NOT NULL to ensure the query was
equivalent when a join was removed, of course this meant that any
subsequent joins may later not have been removed due to the IS NOT NULL
quals existing (which could restrict the rows and remove the ability that
the foreign key could guarantee the existence of exactly 1 row matching the
join condition). I've not yet come up with a nice way to reimplement these
null checks at execution time, so I've thought that perhaps I won't bother,
at least not for this patch. I'd just disable join skipping at planning
time if any of the join columns can have NULLs.
Anyway this is just a progress report, and also to say thanks to Andres,
you might just have saved this patch with the execution time checking idea.
I'll post a patch soon, hopefully once I have merge and nestloop join types
working.
Regards
David Rowley
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2014-10-08 11:40:08 | Re: Patch to support SEMI and ANTI join removal |
Previous Message | Anssi Kääriäinen | 2014-10-08 10:10:12 | Re: Promise index tuples for UPSERT |