Re: *_collapse_limit, geqo_threshold

From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: *_collapse_limit, geqo_threshold
Date: 2009-07-08 16:42:20
Message-ID: 20090708164220.GF22588@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

When I was first familiarizing myself with PostgreSQL, I took a
walk through its documentation on GECO and similar processes in
the literature. One big advantage of GECO is that you can trade
off planning time for plan optimization. I do agree that it should
be updated, but there were definite cases in the literature where
the planning time for exhaustive searches could take orders of
magnitude more time to execute than the differences in the execution
times of the differing plans.

My two cents,
Ken

On Wed, Jul 08, 2009 at 09:43:12AM -0400, Noah Misch wrote:
> On Tue, Jul 07, 2009 at 09:31:14AM -0500, Kevin Grittner wrote:
> > I don't remember any clear resolution to the wild variations in plan
> > time mentioned here:
> >
> > http://archives.postgresql.org/pgsql-hackers/2009-06/msg00743.php
> >
> > I think it would be prudent to try to figure out why small changes in
> > the query caused the large changes in the plan times Andres was
> > seeing. Has anyone else ever seen such behavior? Can we get
> > examples? (It should be enough to get the statistics and the schema,
> > since this is about planning time, not run time.)
>
> With joins between statistically indistinguishable columns, I see planning times
> change by a factor of ~4 for each join added or removed (postgres 8.3). Varying
> join_collapse_limit in the neighborhood of the actual number of joins has a
> similar effect. See attachment with annotated timings. The example uses a
> single table joined to itself, but using distinct tables with identical contents
> yields the same figures.
>
> The expontential factor seems smaller for real queries. I have a query of
> sixteen joins that takes 71s to plan deterministically; it looks like this:
>
> SELECT 1 FROM fact JOIN dim0 ... JOIN dim6
> JOIN t t0 ON fact.key = t.key AND t.x = MCV0
> LEFT JOIN t t1 ON fact.key = t.key AND t.x = MCV1
> JOIN t t2 ON fact.key = t.key AND t.x = MCV2
> LEFT JOIN t t3 ON fact.key = t.key AND t.x = NON-MCV0
> LEFT JOIN t t4 ON fact.key = t.key AND t.x = NON-MCV1
> LEFT JOIN t t5 ON fact.key = t.key AND t.x = NON-MCV2
> LEFT JOIN t t6 ON fact.key = t.key AND t.x = NON-MCV3
> LEFT JOIN t t7 ON fact.key = t.key AND t.x = NON-MCV4
>
> For the real query, removing one join drops plan time to 26s, and removing two
> drops the time to 11s. I don't have a good theory for the multiplier changing
> from 4 for the trivial demonstration to ~2.5 for this real query. Re-enabling
> geqo drops plan time to .5s. These tests used default_statistics_target = 1000,
> but dropping that to 100 does not change anything dramatically.
>
> > I guess the question is whether there is anyone who has had a contrary
> > experience. (There must have been some benchmarks to justify adding
> > geqo at some point?)
>
> I have queries with a few more joins (19-21), and I cancelled attempts to plan
> them deterministically after 600+ seconds and 10+ GiB of memory usage. Even
> with geqo_effort = 10, they plan within 5-15s with good results.
>
> All that being said, I've never encountered a situation where a value other than
> 1 or <inf> for *_collapse_limit appeared optimal.
>
> nm

> SET geqo = off;
> SET join_collapse_limit = 100;
> CREATE TEMP TABLE t AS SELECT * FROM generate_series(1, 1000) f(n); ANALYZE t;
>
> --- Vary join count
> -- 242.4s
> EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t
> t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07
> NATURAL JOIN t t08 NATURAL JOIN t t09 NATURAL JOIN t t10 NATURAL JOIN t t11
> NATURAL JOIN t t12;
> -- 31.2s
> EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t
> t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07
> NATURAL JOIN t t08 NATURAL JOIN t t09 NATURAL JOIN t t10 NATURAL JOIN t t11;
> -- 8.1s
> EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t
> t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07
> NATURAL JOIN t t08 NATURAL JOIN t t09 NATURAL JOIN t t10;
> -- 2.0s
> EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t
> t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07
> NATURAL JOIN t t08 NATURAL JOIN t t09;
> -- 0.5s
> EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t
> t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07
> NATURAL JOIN t t08;
>
> --- Vary join_collapse_limit
> -- 8.1s
> SET join_collapse_limit = 100;
> EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t
> t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07
> NATURAL JOIN t t08 NATURAL JOIN t t09 NATURAL JOIN t t10;
> -- 8.0s
> SET join_collapse_limit = 11;
> EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t
> t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07
> NATURAL JOIN t t08 NATURAL JOIN t t09 NATURAL JOIN t t10;
> -- 2.2s
> SET join_collapse_limit = 10;
> EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t
> t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07
> NATURAL JOIN t t08 NATURAL JOIN t t09 NATURAL JOIN t t10;
> -- 0.5s
> SET join_collapse_limit = 9;
> EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t
> t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07
> NATURAL JOIN t t08 NATURAL JOIN t t09 NATURAL JOIN t t10;
> -- 0.1s
> SET join_collapse_limit = 8;
> EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t
> t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07
> NATURAL JOIN t t08 NATURAL JOIN t t09 NATURAL JOIN t t10;

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chris Browne 2009-07-08 16:43:37 Re: New types for transparent encryption
Previous Message Tom Lane 2009-07-08 16:40:37 Re: multi-threaded pgbench