Re: Selectivity for lopsided foreign key columns

From: Mikkel Lauritsen <renard(at)tala(dot)dk>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Selectivity for lopsided foreign key columns
Date: 2015-12-17 19:14:13
Message-ID: e904acb8fb9dcefe076bd68ab7547763@webmail.tala.dk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 2015-12-17 16:23, Tom Lane wrote:
> Mikkel Lauritsen <renard(at)tala(dot)dk> writes:
>> The schema contains two tables, t1 and t2.
>> t2 has two fields, an id and a tag, and it contains 146 rows that are
>> unique.
>> t1 has two fields, a value and a foreign key referring to t2.id, and
>> it
>> contains 266177 rows.
>> The application retrieves the rows in t1 that match a specific tag in
>> t2, and it turned out that the contents of t1 were distributed in a
>> very
>> lopsided way, where more than 90% of the rows refer to one of two tags
>> from t2:
>> ...
>> The estimate for the number of rows in the result (1824) is way too
>> low,
>> and that leads to bad plans and queries involving more joins on the
>> tables that run about 1000x slower than they should.
>
>> I have currently rewritten the application code to do two queries; one
>> to retrieve the id from t2 that matches the given tag and one to
>> retrieve the rows from t1, and that's a usable workaround but not
>> something we really like doing as a permanent solution. Fiddling with
>> the various statistics related knobs seems to make no difference, but
>> is
>> there be some other way I can make Postgres assume high selectivity
>> for
>> certain tag values? Am I just SOL with the given schema?
>
> You're pretty much SOL. Lacking cross-column statistics, the planner
> has
> no idea which t2.id goes with the given tag, so it can't see that the
> selected id is the one that is most common in t1. You're getting a
> join size estimate that is basically size of t1 divided by number of
> possible values (146), which is about the best we can do without
> knowing
> which id is selected.

--- snip --

Thanks - I thought as much, but it's really nice to have it confirmed
from
people who are way more knowledgeable.

Best regards and thanks again,
Mikkel Lauritsen

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Mathieu VINCENT 2015-12-18 15:21:04 Re: Estimation row error
Previous Message Adam Brusselback 2015-12-17 18:51:55 Re: Terrible plan choice for view with distinct on clause