Quick Links

Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	"Ehrenreich, Sigrid" <Ehrenreich(at)consist(dot)de>
Cc:	"pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject:	Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join
Date:	2020-10-26 22:25:45
Message-ID:	CAApHDvqh4wSDOD+UVZ2xdaymu_-6rvkxjdcTQHOVAT-RTCdzog@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Tue, 27 Oct 2020 at 06:54, Ehrenreich, Sigrid <Ehrenreich(at)consist(dot)de> wrote:
> -> Hash Join (cost=226.27..423.82 rows=115 width=0) (actual time=3.150..7.511 rows=3344 loops=1) <=========== With the FK, the estimation should be 3344, but it is 115 rows

I'd have expected this to find the foreign key and have the join
selectivity of 1.0, but I see it does not due to the fact that one of
the EquivalenceClass has a constant due to the fact.low_card = 1 qual.

In build_join_rel() we call build_joinrel_restrictlist() to get the
join quals that need to be evaluated at the join level, but we only
get the fact.anydata1=dim.anydata1 and fact.anydata2=dim.anydata2
quals there. The low_card qual gets pushed down to the scan level on
each side of the join, so no need for it to get evaluated at the join
level. Later in build_join_rel() we do set_joinrel_size_estimates().
The restrictlist with just the two quals is what we pass to
get_foreign_key_join_selectivity(). Only two of the foreign key
columns are matched there, therefore we don't class that as a match
and just leave it up to the normal selectivity functions.

I feel like we could probably do better there and perhaps somehow
count ECs with ec_has_const as matched, but there seems to be some
assumptions later in get_foreign_key_join_selectivity() where we
determine the selectivity based on the base rel's tuple count. We'd
need to account for how many rows remainder after filtering the ECs
with ec_has_const == true, else we'd be doing the wrong thing. That
needs more thought than I have time for right now.

Your case would work if the foreign key had been on just anydata1 and
anydata2, but there's not much chance of that working without a unique
index on those two columns.

Extended statistics won't help you here either since they're currently
not used for join estimations.

David

In response to

Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join at 2020-10-26 15:58:05 from Ehrenreich, Sigrid

Responses

Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join at 2020-10-26 22:54:01 from Tom Lane

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Tom Lane	2020-10-26 22:54:01	Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join
Previous Message	Justin Pryzby	2020-10-26 20:34:13	Re: Postgres Optimizer ignores information about foreign key relationship, severely misestimating number of returned rows in join