Re: Yet another abort-early plan disaster on 9.3

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Yet another abort-early plan disaster on 9.3
Date: 2014-10-01 18:56:32
Message-ID: 542C4E60.3030505@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On 09/30/2014 04:01 PM, Simon Riggs wrote:
> On 30 September 2014 18:28, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
>>> Anyway, in the particular case I posted fixing n_distinct to realistic
>>> numbers (%) fixed the query plan.
>>
>>
>> But wouldn't fixing the absolute number also have fixed the plan?
>
> There are two causes of this issue.
>
> 1. Poor estimates of n_distinct. Fixable by user.
>
> 2. Poor assumption of homogeneous distribution. No way for user to
> fix. Insufficient stats detail to be able to solve in current planner.
>
> I see (2) as the main source of issues, since as we observe, (1) is fixable.

I disagree that (1) is not worth fixing just because we've provided
users with an API to override the stats. It would unquestionably be
better for us to have a better n_distinct estimate in the first place.
Further, this is an easier problem to solve, and fixing n_distinct
estimates would fix a large minority of currently pathological queries.
It's like saying "hey, we don't need to fix the leak in your radiator,
we've given you a funnel in the dashboard you can pour water into."

I do agree that (2) is worth fixing *as well*. In a first
approximation, one possibility (as Tom suggests) would be to come up
with a mathematical model for a selectivity estimate which was somewhere
*between* homogenous distribution and the worst case. While that
wouldn't solve a lot of cases, it would be a start towards having a
better model.

>> I don't think correlation is up to the task as a complete solution, although
>> it might help a little. There is no way a simple correlation can encode
>> that John retired 15 years ago and hasn't logged on since, while Johannes
>> was hired yesterday and never logged on before then.
>
> Ah, OK, essentially the same example.
>
> Which is why I ruled out correlation stats based approaches and
> suggested a risk-weighted cost approach.

By "risk-weighted" you mean just adjusting cost estimates based on what
the worst case cost looks like, correct? That seemed to be your
proposal from an earlier post. If so, we're in violent agreement here.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2014-10-01 19:03:26 Re: "Value locking" Wiki page
Previous Message Andres Freund 2014-10-01 18:54:39 Re: Scaling shared buffer eviction

Browse pgsql-performance by date

  From Date Subject
Next Message Rodrigo Barboza 2014-10-02 04:43:20 auto vaccum is dying
Previous Message Simon Riggs 2014-09-30 23:01:46 Re: Yet another abort-early plan disaster on 9.3