Re: Weird behaviour of the planner

From: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Weird behaviour of the planner
Date: 2018-08-01 15:06:21
Message-ID: CAECtzeWmeB-w3pmwDML4fAg7F_S1Qp3crKcfydDbQSrDq9Z5=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

2018-08-01 16:59 GMT+02:00 David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>:

> On 2 August 2018 at 02:48, Guillaume Lelarge <guillaume(at)lelarge(dot)info>
> wrote:
> > EXPLAIN (ANALYZE) SELECT DISTINCT * FROM gleu2;
> >
> > QUERY PLAN
> > ------------------------------------------------------------
> ---------------------------------------------------------
> > Unique (cost=12005.97..12049.20 rows=1 width=1430) (actual
> > time=20055.294..20323.348 rows=1 loops=1)
> > -> Sort (cost=12005.97..12006.30 rows=132 width=1430) (actual
> > time=20055.290..20105.738 rows=60000 loops=1)
> > Sort Key: (... 130 columns ...)
> > Sort Method: external sort Disk: 84464kB
> > -> Seq Scan on gleu2 (cost=0.00..12001.32 rows=132 width=1430)
> > (actual time=0.109..114.142 rows=60000 loops=1)
> > Planning time: 10.012 ms
> > Execution time: 20337.854 ms
> > (7 rows)
> >
> > That looks quite good. The total cost is 12049, so I expect this plan to
> > have the smaller cost as it's the choosen plan. Now, I'm disabling Sort,
> and
> > here is what I get:
> >
> > SET enable_sort TO off;
> > EXPLAIN (ANALYZE) SELECT DISTINCT * FROM gleu2;
> >
> > QUERY PLAN
> > ------------------------------------------------------------
> ---------------------------------------------------------
> > HashAggregate (cost=12044.22..12044.23 rows=1 width=1430) (actual
> > time=508.342..508.343 rows=1 loops=1)
> > Group Key: (... 130 columns ...)
> > -> Seq Scan on gleu2 (cost=0.00..12001.32 rows=132 width=1430)
> (actual
> > time=0.036..57.088 rows=60000 loops=1)
> > Planning time: 1.335 ms
> > Execution time: 508.529 ms
> > (5 rows)
>
> When the planner adds a new path it compares the cost not exactly, but
> with a 'fuzz factor'. It's very likely that the hashagg path did not
> make it as it was not fuzzily any cheaper than the unique path. By
> default, this fuzz factor is 1%.
>
>
That may definitely be it. There's not much of a difference in the total
costs.

It seems in your case the costs don't quite match reality which is
> quite likely due to the poor row estimates on "gleu2". Has that table
> been analyzed recently? or is there some reason that auto-vacuum is
> not getting to it?
>
>
This is a small test case of a much bigger query joining a large number of
tables, materialized views, views (calling functions), etc. The actual plan
contains 84 nodes (32 scans, lots of joins, and a few other nodes). The
mis-estimate is to follow what the big query gives me.

There's a bit more reading of what I'm talking about in
> https://github.com/postgres/postgres/blob/master/src/
> backend/optimizer/util/pathnode.c#L141
>
>
I'm gonna read that. Thank you.

--
Guillaume.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2018-08-01 15:12:27 Re: ALTER ROLE SET search_path produced by pg_dumpall gives : ERROR: syntax error at or near "$" .
Previous Message David Rowley 2018-08-01 14:59:18 Re: Weird behaviour of the planner