From: | Claudio Freire <klaussfreire(at)gmail(dot)com> |
---|---|
To: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <jim(at)nasby(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Planner hints in Postgresql |
Date: | 2014-03-18 20:08:39 |
Message-ID: | CAGTBQpbdqT=1NuMPUcM6RZLhWVhr9H1c8QFZiy9240n9OG8Srw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Mar 18, 2014 at 4:48 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> > That alone could improve things considerably, and statistical info could
> be
> > propagated along expressions to make it possible to model uncertainty in
> > complex expressions as well.
>
> But how would that work? I see no solution adumbrated there :-).
>
I would have to tipify the SQL expression grammar for this, but I don't
think it would be impossible. Most non-function expression nodes seem
rather trivial. Even CASE, as long as you have a distribution for the
conditional, you can derive a distribution for the whole. User defined
functions would be another game, though. Correlation would have to be
measured, and that can be troublesome and a weak spot of risk computation
as much as it is of planning, but it could be fuzzed arbitrarily until
properly computed - after all, dependency on correlation or non-correlation
is a known source of risk, and accounting for it in any way is better than
not.
> Let's say you change the rowcount estimate to low/bestguess/high *and*
> you only engage extra searches when there is enough disparity between
> those values you still get exponentially more searches.
I was under the impression the planner already did an exhaustive search for
some queries. So it's just a matter of picking the best plan among those
(ie: estimating cost). The case of GEQO isn't any different, except perhaps
introducing a risk-decreasing transformation would be needed, unless I'm
missing something.
> (my thinking
> is that if bestguess estimated execution time is some user definable
> amount faster then low/high at any node), a more skeptical plan is
> introduced. All that could end up being pessimal to the general case
> though.
I think the cost estimate would be replaced by a distribution (simplified
perhaps into an array of moments, or whatever is easily manipulated in the
face of complex expressions). What the user would pick, is a sampling
method of said distribution. Then, plans get measured by the user's stick
(say: arithmetic mean, median, 90th percentile, etc). The arithmetic mean
would I guess be the default, and that ought to be roughly equivalent to
the planner's current behavior.
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2014-03-18 20:11:26 | Re: First-draft release notes for next week's releases |
Previous Message | Robert Haas | 2014-03-18 20:06:02 | Re: [WIP] Better partial index-only scans |