Re: New design for FK-based join selectivity estimation

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: New design for FK-based join selectivity estimation
Date: 2016-06-17 14:53:36
Message-ID: 20877.1466175216@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> Thanks for getting the patch into a much better shape. I've quickly
> reviewed the patch this morning before leaving to the airport - I do
> plan to do additional review/testing once in the air or perhaps over the
> weekend. So at the moment I only have a few minor comments:

> 1) Shouldn't we define a new macro in copyfuncs.c, to do the memcpy for
> us? Seems a bit strange we have macros for everything else.

Perhaps, but it seemed not that compelling since we need bespoke code for
those fields in outfuncs.c etc. Maybe it would be worth thinking about
macro infrastructure for array-type fields in all of those modules.

> 2) I'm wondering whether removing the restrict infos when
> if (fkinfo->eclass[i] == rinfo->parent_ec)
> is actually correct. Can't the EC include conditions that do not match
> the FK?

Doesn't matter. The point is that it *does* include a condition that
does match the FK, whether it chose to generate exactly that condition
for this join or some related one.

> I mean something like this:
> CREATE TABLE a (id1 INT PRIMARY KEY, id2 INT);
> CREATE TABLE b (id1 INT REFERENCES a (id1), id2 INT);
> and then something like
> SELECT * FROM a JOIN b ON (a.id1 = b.id1 AND a.id1 = b.id2)

Right. In this case we'll have an EC containing {a.id1, b.id1, b.id2}
which means that equivclass.c will generate a restriction condition
b.id1 = b.id2 to be applied at the scan of b. At the join level, it
has a choice whether to generate a.id1 = b.id1 or a.id1 = b.id2.
It could generate both, but that would be pointlessly inefficient (and
would likely confuse the selectivity estimators, too). But even if it
chooses to generate a.id1 = b.id2, we should recognize that the FK is
matched. What we're effectively doing by dropping that clause in
favor of treating the FK as matched is overridding equivclass.c's
arbitrary choice of which join clause to generate with an equally valid
choice that is easier to estimate for.

> 3) I think this comment in get_foreign_key_join_selectivity is wrong and
> should instead say 'to FK':
> /* Otherwise, see if rinfo was previously matched to EC */

Duh, yeah.

I rewrote the comments in this section to clarify a bit:

/* Drop this clause if it matches any column of the FK */
for (i = 0; i < fkinfo->nkeys; i++)
{
if (rinfo->parent_ec)
{
/*
* EC-derived clauses can only match by EC. It is okay to
* consider any clause derived from the same EC as
* matching the FK: even if equivclass.c chose to generate
* a clause equating some other pair of Vars, it could
* have generated one equating the FK's Vars. So for
* purposes of estimation, we can act as though it did so.
*
* Note: checking parent_ec is a bit of a cheat because
* there are EC-derived clauses that don't have parent_ec
* set; but such clauses must compare expressions that
* aren't just Vars, so they cannot match the FK anyway.
*/
if (fkinfo->eclass[i] == rinfo->parent_ec)
{
remove_it = true;
break;
}
}
else
{
/*
* Otherwise, see if rinfo was previously matched to FK as
* a "loose" clause.
*/
if (list_member_ptr(fkinfo->rinfos[i], rinfo))
{
remove_it = true;
break;
}
}
}

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Teodor Sigaev 2016-06-17 15:07:37 Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?
Previous Message Konstantin Knizhnik 2016-06-17 14:39:41 Re: Restriction of windows functions