Re: pb with join plan

From: Marc Millas <marc(dot)millas(at)mokadb(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: pb with join plan
Date: 2023-06-21 16:10:13
Message-ID: CADX_1aZj-9r2ano85Tmvwrt4NfS_pv3vsqZzxJLb1hXN0DLDtg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Marc MILLAS

On Wed, Jun 21, 2023 at 12:43 PM Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
wrote:

> On 6/21/23 00:26, Marc Millas wrote:
> >
> >
> > On Tue, Jun 20, 2023 at 11:19 PM David Rowley <dgrowleyml(at)gmail(dot)com
> > <mailto:dgrowleyml(at)gmail(dot)com>> wrote:
> >
> > On Wed, 21 Jun 2023 at 08:34, Marc Millas <marc(dot)millas(at)mokadb(dot)com
> > <mailto:marc(dot)millas(at)mokadb(dot)com>> wrote:
> > >
> > > On Tue, Jun 20, 2023 at 10:14 PM David Rowley
> > <dgrowleyml(at)gmail(dot)com <mailto:dgrowleyml(at)gmail(dot)com>> wrote:
> > >>
> > >> On Wed, 21 Jun 2023 at 07:42, Marc Millas <marc(dot)millas(at)mokadb(dot)com
> > <mailto:marc(dot)millas(at)mokadb(dot)com>> wrote:
> > >> > But if I do the same with clause one OR clause 2, I have to
> > kill the request after an hour, seeing the filesystem showing more
> > than 140 Mb of increased usage.
>
> It's a bit weird the "victor" table is joined seemingly without any join
> conditions, leading to a cross join (which massively inflates the cost
> for joins above it). Maybe the anonymized plan mangles it somehow.
>

So I did try to simplify my pb.
I create a table with the result of the first 3 joins.
That table do have 15M lines. all tables have been vacuum analyze

Now if I do an explain analyze of a simple join between that table and my
original table 4
using a simple = clause, I get a result in one second (around). and the
planner guesses for rows seems in line with the observed values .
if I use a substr(table1.a)= table2.b, the explain analyze get a result in
21 seconds and the planner estimates a 65M rows result set while the
observed is 330 k rows
so here its 20 times slower and the discrepency between planner rows guess
and reality is a 200 ratio.

Now, if I try an explain analyze with join on a=b or substr(c)=d or e=f
then... I kill the query after a quarter an hour without any answer.
if I try to just explain the query, the planner rows guess becomes more
than 2 Billions....
the extremely simple query and plan are here, without automatic obfuscation
https://explain.depesz.com/s/b8Ll

>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Marc MILLAS

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Kirk Wolak 2023-06-22 00:52:32 Re: Adding SHOW CREATE TABLE
Previous Message Dominique Devienne 2023-06-21 14:51:05 Re: libpq: What can and cannot be bound? How to know?