Quick Links

Re: A query planner that learns

From:	AgentM <agentm(at)themactionfaction(dot)com>
To:	pgsql general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: A query planner that learns
Date:	2006-10-13 15:53:15
Message-ID:	1B7FAD85-20E9-40AE-8018-724F94E132F8@themactionfaction.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Oct 13, 2006, at 11:47 , John D. Burger wrote:

> Erik Jones wrote:
>
>> Forgive me if I'm way off here as I'm not all that familiar with
>> the internals of postgres, but isn't this what the genetic query
>> optimizer discussed the one of the manual's appendixes is supposed
>> to do.
>
> No - it's not an "optimizer" in that sense. When there are a small
> enough set of tables involved, the planner uses a dynamic
> programming algorithm to explore the entire space of all possible
> plans. But the space grows exponentially (I think) with the number
> of tables - when this would take too long, the planner switches to
> a genetic algorithm approach, which explores a small fraction of
> the plan space, in a guided manner.
>
> But with both approaches, the planner is just using the static
> statistics gathered by ANALYZE to estimate the cost of each
> candidate plan, and these statistics are based on sampling your
> data - they may be wrong, or at least misleading. (In particular,
> the statistic for total number of unique values is frequently =way=
> off, per a recent thread here. I have been reading about this,
> idly thinking about how to improve the estimate.)
>
> The idea of a learning planner, I suppose, would be one that
> examines cases where these statistics lead to very misguided
> expectations. The simplest version of a "learning" planner could
> simply bump up the statistics targets on certain columns. A
> slightly more sophisticated idea would be for some of the
> statistics to optionally use parametric modeling (this column is a
> Gaussian, let's estimate the mean and variance, this one is a Beta
> distribution ...). Then the smarter planner could spend some
> cycles applying more sophisticated statistical modeling to
> problematic tables/columns.

One simple first step would be to run an ANALYZE whenever a
sequential scan is executed. Is there a reason not to do this? It
could be controlled by a GUC variable in case someone wants
repeatable plans.

Further down the line, statistics could be collected during the
execution of any query- updating histograms on delete and update, as
well.

-M

In response to

Re: A query planner that learns at 2006-10-13 15:47:05 from John D. Burger

Responses

Re: A query planner that learns at 2006-10-13 17:51:01 from Jim C. Nasby

Browse pgsql-general by date

	From	Date	Subject
Next Message	Tom Lane	2006-10-13 16:04:02	Re: UTF-8
Previous Message	Harpreet Dhaliwal	2006-10-13 15:51:06	Server Added Y'day. Missing Today