From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | "Ehrenreich, Sigrid" <Ehrenreich(at)consist(dot)de> |
Cc: | "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join |
Date: | 2020-10-26 22:25:45 |
Message-ID: | CAApHDvqh4wSDOD+UVZ2xdaymu_-6rvkxjdcTQHOVAT-RTCdzog@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Tue, 27 Oct 2020 at 06:54, Ehrenreich, Sigrid <Ehrenreich(at)consist(dot)de> wrote:
> -> Hash Join (cost=226.27..423.82 rows=115 width=0) (actual time=3.150..7.511 rows=3344 loops=1) <=========== With the FK, the estimation should be 3344, but it is 115 rows
I'd have expected this to find the foreign key and have the join
selectivity of 1.0, but I see it does not due to the fact that one of
the EquivalenceClass has a constant due to the fact.low_card = 1 qual.
In build_join_rel() we call build_joinrel_restrictlist() to get the
join quals that need to be evaluated at the join level, but we only
get the fact.anydata1=dim.anydata1 and fact.anydata2=dim.anydata2
quals there. The low_card qual gets pushed down to the scan level on
each side of the join, so no need for it to get evaluated at the join
level. Later in build_join_rel() we do set_joinrel_size_estimates().
The restrictlist with just the two quals is what we pass to
get_foreign_key_join_selectivity(). Only two of the foreign key
columns are matched there, therefore we don't class that as a match
and just leave it up to the normal selectivity functions.
I feel like we could probably do better there and perhaps somehow
count ECs with ec_has_const as matched, but there seems to be some
assumptions later in get_foreign_key_join_selectivity() where we
determine the selectivity based on the base rel's tuple count. We'd
need to account for how many rows remainder after filtering the ECs
with ec_has_const == true, else we'd be doing the wrong thing. That
needs more thought than I have time for right now.
Your case would work if the foreign key had been on just anydata1 and
anydata2, but there's not much chance of that working without a unique
index on those two columns.
Extended statistics won't help you here either since they're currently
not used for join estimations.
David
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2020-10-26 22:54:01 | Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join |
Previous Message | Justin Pryzby | 2020-10-26 20:34:13 | Re: Postgres Optimizer ignores information about foreign key relationship, severely misestimating number of returned rows in join |