From: | AgentM <agentm(at)themactionfaction(dot)com> |
---|---|
To: | pgsql general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: A query planner that learns |
Date: | 2006-10-13 15:53:15 |
Message-ID: | 1B7FAD85-20E9-40AE-8018-724F94E132F8@themactionfaction.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Oct 13, 2006, at 11:47 , John D. Burger wrote:
> Erik Jones wrote:
>
>> Forgive me if I'm way off here as I'm not all that familiar with
>> the internals of postgres, but isn't this what the genetic query
>> optimizer discussed the one of the manual's appendixes is supposed
>> to do.
>
> No - it's not an "optimizer" in that sense. When there are a small
> enough set of tables involved, the planner uses a dynamic
> programming algorithm to explore the entire space of all possible
> plans. But the space grows exponentially (I think) with the number
> of tables - when this would take too long, the planner switches to
> a genetic algorithm approach, which explores a small fraction of
> the plan space, in a guided manner.
>
> But with both approaches, the planner is just using the static
> statistics gathered by ANALYZE to estimate the cost of each
> candidate plan, and these statistics are based on sampling your
> data - they may be wrong, or at least misleading. (In particular,
> the statistic for total number of unique values is frequently =way=
> off, per a recent thread here. I have been reading about this,
> idly thinking about how to improve the estimate.)
>
> The idea of a learning planner, I suppose, would be one that
> examines cases where these statistics lead to very misguided
> expectations. The simplest version of a "learning" planner could
> simply bump up the statistics targets on certain columns. A
> slightly more sophisticated idea would be for some of the
> statistics to optionally use parametric modeling (this column is a
> Gaussian, let's estimate the mean and variance, this one is a Beta
> distribution ...). Then the smarter planner could spend some
> cycles applying more sophisticated statistical modeling to
> problematic tables/columns.
One simple first step would be to run an ANALYZE whenever a
sequential scan is executed. Is there a reason not to do this? It
could be controlled by a GUC variable in case someone wants
repeatable plans.
Further down the line, statistics could be collected during the
execution of any query- updating histograms on delete and update, as
well.
-M
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2006-10-13 16:04:02 | Re: UTF-8 |
Previous Message | Harpreet Dhaliwal | 2006-10-13 15:51:06 | Server Added Y'day. Missing Today |