Re: Hints (was Poor performance using CTE)

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Kevin Grittner <kgrittn(at)mail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Craig James <cjames(at)emolecules(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, David Greco <David_Greco(at)harte-hanks(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Hints (was Poor performance using CTE)
Date: 2012-11-22 00:53:56
Message-ID: CAMkU=1yvc7qV3h57zzcG6eJvJFZN03V-J2nR6pSdJXXdJBB+=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Nov 21, 2012 at 8:05 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> On 21.11.2012 15:42, Kevin Grittner wrote:
>>
>> Better, IMV, would be to identify what sorts of hints people actually
>> find useful, and use that as the basis for TODO items for optimizer
>> improvement as well as inventing clear ways to specify the desired
>> coercion. I liked the suggestion that a CTE which didn't need to be
>> materialized because of side-effects or multiple references have a
>> keyword. Personally, I think that AS MATERIALIZED x (SELECT ...)
>> would be preferable to AS x (SELECT ... OFFSET 0) as the syntax to
>> specify that.
>
>
> Rather than telling the planner what to do or not to do, I'd much rather
> have hints that give the planner more information about the tables and quals
> involved in the query. A typical source of bad plans is when the planner
> gets its cost estimates wrong. So rather than telling the planner to use a
> nested loop join for "a INNER JOIN b ON a.id = b.id", the user could tell
> the planner that there are only 10 rows that match the "a.id = b.id" qual.

For each a.id there are 10 b.id, or for each b.id there are 10 a.id?

> That gives the planner the information it needs to choose the right plan on
> its own. That kind of hints would be much less implementation specific and
> much more likely to still be useful, or at least not outright
> counter-productive, in a future version with a smarter planner.

When I run into unexpectedly poor performance, I have an intuitive
enough feel for my own data that I know what plan it ought to be
using. Figuring out why it is not using it is very hard. For one
thing, EXPLAIN tells you about the "winning" plan, but there is no
visibility into what ought to be the winning plan but isn't, so no way
to see why it isn't. So you first have to use our existing non-hint
hints (enable_*, doing weird things with cost_*, CTE stuff) to trick
it into using the plan I want it to use, before I can figure out why
it isn't using it, before I could figure out what hints of the style
you are suggesting to supply to get it to use it.

So I think the type of hints you are suggesting would be about as hard
for the user to use as debugging the planner for the particular case
would be. While the more traditional type of hint is easy to use,
because the end user understands their data more than they understand
the guts of the planner.

Cheers,

Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Eildert Groeneveld 2012-11-22 07:54:04 Re: fast read of binary data
Previous Message Craig Ringer 2012-11-22 00:42:51 Re: Poor performance using CTE