From: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: PATCH: use foreign keys to improve join estimates v1 |
Date: | 2015-09-27 12:00:03 |
Message-ID: | CAKJS1f-Ad5Fs9Fsay_=LZT_Ep81r+qU3pVLcecz+a1+BJMVp0g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 26 September 2015 at 01:57, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:
> Hi,
>
> On 09/25/2015 03:39 AM, David Rowley wrote:
>
>> I looked at this again, and I'm not all that sure it's possible to
>>
> assume that 1.0 / <tuples> is valid when there's more than one
>> relation at either side of the join.
>>
> >
>
>> My reasoning for this is that the whole basis for the patch is that a
>> if we find a foreign key match, then we can be sure enough, as far as
>> row estimations go, that exactly 1 tuple will exist matching that
>> condition. This assumption of the 1 tuple no longer holds when 2
>> relations have already been joined, as this can either cause tuple
>> duplication, or elimination.
>>
>
> I don't see why that would be the case. Of course, you need to take the
> right <tuples>, i.e. the "target" of the foreign key (the table with UNIQUE
> constraint) so that the selectivity matches the fact that exactly 1 tuple
> (on the PK side) matches.
>
hmm, ok. You're right. It appears I was a bit confused, but thanks for
explaining it again. I get it now.
I've been working on this again. I've put back the code that you wrote for
the looping over each combination of relations from either side of the join.
I've also added some code to get around the problem with eclass joins and
the RestrictInfo having some alternative Vars that don't belong to the
foreign key. Basically I'm just checking if the RestrictInfo has a
parent_ec, and if it does just loop over the members to try and find the
Vars that belong to the foreign key. I've tested it with the following, and
it seems to work:
create table a as select i as a_id1, i as a_id2, i as dummy1 from
generate_series(0,999) s(i);
alter table a add unique (a_id1, a_id2);
create table b as select i as b_id1, i as b_id2 from generate_series(0,332)
s(i);
analyze a;
analyze b;
alter table b add foreign key (b_id1, b_id2) references a (a_id1, a_id2);
explain analyze select * from a inner join b on a.dummy1 = b.b_id1 and
a.a_id2 = b.b_id2 where a.a_id1 = a.dummy1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Hash Join (cost=18.57..26.41 rows=2 width=20) (actual time=0.775..1.046
rows=333 loops=1)
Hash Cond: ((b.b_id1 = a.dummy1) AND (b.b_id2 = a.a_id2))
-> Seq Scan on b (cost=0.00..5.33 rows=333 width=8) (actual
time=0.013..0.046 rows=333 loops=1)
-> Hash (cost=18.50..18.50 rows=5 width=12) (actual time=0.737..0.737
rows=1000 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 51kB
-> Seq Scan on a (cost=0.00..18.50 rows=5 width=12) (actual
time=0.014..0.389 rows=1000 loops=1)
Filter: (dummy1 = a_id1)
The non-patched version estimates 1 row. The patched estimates 2 rows, but
that's due to the bad estimate on dummy1 = a_id1.
The 2 comes from ceil(5 * 0.333).
Perhaps you have a better test case to for this?
Regards
David Rowley
--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services
Attachment | Content-Type | Size |
---|---|---|
estimation-with-fkeys-v2_davidv3.patch | application/octet-stream | 20.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2015-09-27 14:50:38 | Re: pg_dump LOCK TABLE ONLY question |
Previous Message | Filip Rembiałkowski | 2015-09-27 10:43:14 | pg_dump LOCK TABLE ONLY question |