Re: pg9.6 segfault using simple query (related to use fk for join estimates)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Stefan Huehner <stefan(at)huehner(dot)org>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg9.6 segfault using simple query (related to use fk for join estimates)
Date: 2016-05-04 21:26:59
Message-ID: 48919afc-1993-8ca8-5b42-3949e9166e92@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 05/04/2016 11:02 PM, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Wed, May 4, 2016 at 2:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> My other design-level complaint is that basing this on foreign keys is
>>> fundamentally the wrong thing. What actually matters is the unique index
>>> underlying the FK; that is, if we have "a.x = b.y" and there's a
>>> compatible unique index on b.y, we can conclude that no A row will match
>>> more than one B row, whether or not an explicit FK relationship has been
>>> declared. So we should drive this off unique indexes instead of FKs,
>>> first because we will find more cases, and second because the planner
>>> already examines indexes and doesn't need any additional catalog lookups
>>> to get the required data. (IOW, the relcache additions that were made in
>>> this patch series should go away too.)
>
>> Without prejudice to anything else in this useful and detailed review,
>> I have a question about this. A unique index proves that no A row
>> will match more than one B row, and I agree that deriving that from
>> unique indexes is sensible. However, ISTM that an FK provides
>> additional information: we know that, modulo filter conditions on B,
>> every A row will match *exactly* one row B row, which can prevent us
>> from *underestimating* the size of the join product. A unique index
>> can't do that.
>
> Very good point, but unless I'm missing something, that is not what the
> current patch does. I'm not sure offhand whether that's an important
> estimation failure mode currently, or if it is whether it would be
> sensible to try to implement that rule entirely separately from the "at
> most one" aspect, or if it isn't sensible, whether that's a sufficiently
> strong reason to confine the "at most one" logic to working only with FKs
> and not with bare unique indexes.

FWIW it's a real-world problem with multi-column FKs. As David pointed
out upthread, a nice example of this issue is Q9 in the TPC-H bench,
where the underestimate leads to HashAggregate and then OOM failure.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2016-05-04 21:28:24 Re: pg_dump dump catalog ACLs
Previous Message Kevin Grittner 2016-05-04 21:22:41 Re: what to revert