From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Stefan Huehner <stefan(at)huehner(dot)org>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: pg9.6 segfault using simple query (related to use fk for join estimates) |
Date: | 2016-05-04 21:26:59 |
Message-ID: | 48919afc-1993-8ca8-5b42-3949e9166e92@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 05/04/2016 11:02 PM, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Wed, May 4, 2016 at 2:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> My other design-level complaint is that basing this on foreign keys is
>>> fundamentally the wrong thing. What actually matters is the unique index
>>> underlying the FK; that is, if we have "a.x = b.y" and there's a
>>> compatible unique index on b.y, we can conclude that no A row will match
>>> more than one B row, whether or not an explicit FK relationship has been
>>> declared. So we should drive this off unique indexes instead of FKs,
>>> first because we will find more cases, and second because the planner
>>> already examines indexes and doesn't need any additional catalog lookups
>>> to get the required data. (IOW, the relcache additions that were made in
>>> this patch series should go away too.)
>
>> Without prejudice to anything else in this useful and detailed review,
>> I have a question about this. A unique index proves that no A row
>> will match more than one B row, and I agree that deriving that from
>> unique indexes is sensible. However, ISTM that an FK provides
>> additional information: we know that, modulo filter conditions on B,
>> every A row will match *exactly* one row B row, which can prevent us
>> from *underestimating* the size of the join product. A unique index
>> can't do that.
>
> Very good point, but unless I'm missing something, that is not what the
> current patch does. I'm not sure offhand whether that's an important
> estimation failure mode currently, or if it is whether it would be
> sensible to try to implement that rule entirely separately from the "at
> most one" aspect, or if it isn't sensible, whether that's a sufficiently
> strong reason to confine the "at most one" logic to working only with FKs
> and not with bare unique indexes.
FWIW it's a real-world problem with multi-column FKs. As David pointed
out upthread, a nice example of this issue is Q9 in the TPC-H bench,
where the underestimate leads to HashAggregate and then OOM failure.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2016-05-04 21:28:24 | Re: pg_dump dump catalog ACLs |
Previous Message | Kevin Grittner | 2016-05-04 21:22:41 | Re: what to revert |