Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: "Ehrenreich, Sigrid" <Ehrenreich(at)consist(dot)de>
Cc: "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join
Date: 2020-10-26 22:25:45
Message-ID: CAApHDvqh4wSDOD+UVZ2xdaymu_-6rvkxjdcTQHOVAT-RTCdzog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, 27 Oct 2020 at 06:54, Ehrenreich, Sigrid <Ehrenreich(at)consist(dot)de> wrote:
> -> Hash Join (cost=226.27..423.82 rows=115 width=0) (actual time=3.150..7.511 rows=3344 loops=1) <=========== With the FK, the estimation should be 3344, but it is 115 rows

I'd have expected this to find the foreign key and have the join
selectivity of 1.0, but I see it does not due to the fact that one of
the EquivalenceClass has a constant due to the fact.low_card = 1 qual.

In build_join_rel() we call build_joinrel_restrictlist() to get the
join quals that need to be evaluated at the join level, but we only
get the fact.anydata1=dim.anydata1 and fact.anydata2=dim.anydata2
quals there. The low_card qual gets pushed down to the scan level on
each side of the join, so no need for it to get evaluated at the join
level. Later in build_join_rel() we do set_joinrel_size_estimates().
The restrictlist with just the two quals is what we pass to
get_foreign_key_join_selectivity(). Only two of the foreign key
columns are matched there, therefore we don't class that as a match
and just leave it up to the normal selectivity functions.

I feel like we could probably do better there and perhaps somehow
count ECs with ec_has_const as matched, but there seems to be some
assumptions later in get_foreign_key_join_selectivity() where we
determine the selectivity based on the base rel's tuple count. We'd
need to account for how many rows remainder after filtering the ECs
with ec_has_const == true, else we'd be doing the wrong thing. That
needs more thought than I have time for right now.

Your case would work if the foreign key had been on just anydata1 and
anydata2, but there's not much chance of that working without a unique
index on those two columns.

Extended statistics won't help you here either since they're currently
not used for join estimations.

David

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2020-10-26 22:54:01 Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join
Previous Message Justin Pryzby 2020-10-26 20:34:13 Re: Postgres Optimizer ignores information about foreign key relationship, severely misestimating number of returned rows in join