Re: pg9.6 segfault using simple query (related to use fk for join estimates)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Stefan Huehner <stefan(at)huehner(dot)org>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: pg9.6 segfault using simple query (related to use fk for join estimates)
Date: 2016-04-30 17:35:12
Message-ID: 22731.1462037712@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com> writes:
> On 29/04/2016 18:05, Tom Lane wrote:
>> Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com> writes:
>>> The segfault is caused by quals_match_foreign_key() calling get_leftop()
>>> and get_rightop() on a ScalarArrayOpExpr node.
>>>
>>> I'm not sure that assuming this compatibility is the right way to fix
>>> this though.

>> It certainly isn't.

> Agreed. New attached patch handles explicitly each node tag.

No, this is completely nuts. The problem is that quals_match_foreign_key
is simply assuming that every clause in the list it's given is an OpExpr,
which is quite wrong. It should just ignore non-OpExpr quals, since it
cannot do anything useful with them anyway. There's a comment claiming
that non-OpExpr quals were already rejected:

* Here since 'usefulquals' only contains bitmap indexes for quals
* of type "var op var" we can safely skip checking this.

but that doesn't appear to have anything to do with current reality.

While this in itself is about a two-line fix, after reviewing
137805f89acb3611 I'm pretty unhappy that it got committed at all.
I think this obvious bug is a good sign that it wasn't ready.
Other unfinished aspects like invention of an undocumented GUC
don't leave a good impression either.

Moreover, it looks to me like this will add quite a lot of overhead,
probably far more than is justified, because clauselist_join_selectivity
is basically O(N^2) in the relation-footprint of the current join --- and
not with a real small multiplier, either, as the functions it calls
contain about four levels of nested loops themselves. Maybe that's
unmeasurable on trivial test cases but it's going to be disastrous in
large queries, or for relations with large numbers of foreign keys.

I think this should be reverted and pushed out to 9.7 development.
It needs a significant amount of rewriting to fix the performance
issue, and now's not the time to be doing that.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomasz Rybak 2016-04-30 18:25:18 Re: Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 - > 10.0
Previous Message Joshua D. Drake 2016-04-30 17:29:12 Re: Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0