Re: Cardinality estimate of the inner relation

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: Frédéric Yhuel <frederic(dot)yhuel(at)dalibo(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Cc: Christophe Courtois <christophe(dot)courtois(at)dalibo(dot)com>
Subject: Re: Cardinality estimate of the inner relation
Date: 2024-11-23 02:07:17
Message-ID: dfb633b2-6e7b-453a-a2ed-a3a32d34bdd6@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 22/11/2024 22:53, Frédéric Yhuel wrote:
> My colleague Christophe Courtois and I have been trying to fix a bad
> plan for one of Dalibo's clients. It is a (probably well-known) problem
> with skewed data and a parameterized Nested Loop with an underestimation
> of the cardinality of the inner relation.
>
> Here is a test case (the script to create and populate the two tables is
> at the end):
Thanks for the case provided!

I wonder if data science has invented a statistic or selectivity
estimation technique that could tackle your case in general. As I see,
we should touch the table of products to realise which specific ID has
the product named 'Babar'.
I can imagine the trick when you build MCV on (id, name) and have a
chance to find this popular ID, cache it, and use it during join clause
estimation later. But it seems too expensive to do the same for
arbitrary incoming queries.

If you want a workaround, such cases fit query-driven techniques. Among
open-source ones, I can point your attention to the AQO extension
(honestly, designed under my command). It can save information about an
estimation error and correct the query next time.

In an enterprise-grade area, you can pick the sr_plan extension, which
is designed to store the plan for a specific query (you can choose
parameterisation on your own) and spread it globally across all
instances' backends.

--
regards, Andrei Lepikhov

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Andrei Lepikhov 2024-11-24 12:04:17 Re: Performance of Query 60 on TPC-DS Benchmark
Previous Message Tomas Vondra 2024-11-23 01:38:13 Re: could not send data to client: Connection reset by peer