Re: Help with row estimate problem

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: Jon Zeppieri <zeppieri(at)gmail(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Help with row estimate problem
Date: 2024-07-30 15:34:20
Message-ID: 3b34a69b-c81b-475d-a5cd-9e45298201b3@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 29/7/2024 22:51, Jon Zeppieri wrote:
> Of course, I'd prefer not to have to materialize this relation
> explicitly. This particular query, for this particular user, benefits
> from it, but similar queries or queries for different users may not.
>
> I think the root of the problem is that population size (i.e., the
> number of members in a given population) has a high variance, and then
> planner is basing its estimates on the average population size (and
> maybe the average number of populations to which a user has access?),
> which is not especially useful. Is there anything I can do about this?
> Would any extended statistics be useful here?
Thanks for report. I see such cases frequently enough and the key
problem here is data skew, as you already mentioned. Extended statistics
doesn't help here. Also, because we can't estimate specific values
coming from the outer NestLoop - we can't involve MCV to estimate
selectivity of the population. That's the reason why the optimiser uses
ndistinct value.
What you can do? I see only one option - split the table to some
partitions where data will be distributed more or less uniformly. And
invent a criteria for pruning unnecessary partitions.
Of course, you can also try pg_hint_plan and force planner to use
MergeJoin or HashJoin in that suspicious case.

--
regards, Andrei Lepikhov

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jon Zeppieri 2024-07-30 17:22:11 Re: Help with row estimate problem
Previous Message Jon Zeppieri 2024-07-29 20:51:40 Help with row estimate problem