Re: nested loop semijoin estimates

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mark Wong <markwkm(at)gmail(dot)com>
Subject: Re: nested loop semijoin estimates
Date: 2015-06-02 14:37:59
Message-ID: 28688.1433255879@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> OK, so I did the testing today - with TPC-H and TPC-DS benchmarks. The
> results are good, IMHO.

> With TPC-H, I've used 1GB and 4GB datasets, and I've seen no plan
> changes at all. I don't plan to run the tests on larger data sets, I do
> expect the behavior to remain the same, considering the uniformity of
> TPC-H data sets.

> With TPC-DS (using the 63 queries supported by PostgreSQL), I've seen
> two cases of plan changes - see the plans attached. In both cases
> however the plan change results in much better performance. While on
> master the queries took 23 and 18 seconds, with the two patches it's
> only 7 and 3. This is just the 1GB dataset. I'll repeat the test with
> the 4GB dataset and post an update if there are any changes.

I'm a bit disturbed by that, because AFAICS from the plans, these queries
did not involve any semi or anti joins, which should mean that the patch
would not have changed the planner's behavior. You were using the second
patch as-posted, right, without further hacking on
compare_path_costs_fuzzily?

It's possible that the change was due to random variation in ANALYZE
statistics, in which case it was just luck.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Seltenreich 2015-06-02 14:43:24 [PATCH] Add error handling to byteaout.
Previous Message Amit Langote 2015-06-02 14:35:31 Re: pg_xlog -> pg_xjournal?