From: | Mark Kirkwood <markir(at)coretech(dot)co(dot)nz> |
---|---|
To: | pgsql(at)mohawksoft(dot)com |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Query optimizer 8.0.1 (and 8.0) |
Date: | 2005-02-07 21:22:09 |
Message-ID: | 4207DC01.9060102@coretech.co.nz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Maybe I am missing something - ISTM that you can increase your
statistics target for those larger tables to obtain a larger (i.e.
better) sample.
regards
Mark
pgsql(at)mohawksoft(dot)com wrote:
>>pgsql(at)mohawksoft(dot)com writes:
> Any and all random sampling assumes a degree of uniform distribution. This
> is the basis of the model. It assumes that chunks of the whole will be
> representative of the whole (to some degree). This works when normal
> variations are more or less distributed uniformly. As variations and
> trends becomes less uniformly distributed, more samples are required to
> characterize it.
>
> Douglas Adams had a great device called the "Total Perspective Vortex"
> which infered the whole of the universe from a piece of fairy cake. It was
> a subtle play on the absurd notion that a very small sample could lead to
> an understanding of an infinitly larger whole.
>
> On a very basic level, why bother sampling the whole table at all? Why not
> check one block and infer all information from that? Because we know that
> isn't enough data. In a table of 4.6 million rows, can you say with any
> mathmatical certainty that a sample of 100 points can be, in any way,
> representative?
>
> Another problem with random sampling is trend analysis. Often times there
> are minor trends in data. Ron pointed out the lastname firstname trend.
> Although there seems to be no correlation between firstnames in the table,
> there are clearly groups or clusters of ordered data that is an ordering
> that is missed by too small a sample.
>
> I understand why you chose the Vitter algorithm, because it provides a
> basically sound methodology for sampling without knowledge of the size of
> the whole, but I think we can do better. I would suggest using the current
> algorithm the first time through, then adjust the number of samples [n]
> based on the previous estimate of the size of the table [N]. Each
> successive ANALYZE will become more accurate. The Vitter algorithm is
> still useful as [N] will always be an estimate.
>
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2005-02-07 22:01:41 | Re: Is there a way to make VACUUM run completely outside |
Previous Message | Bruno Wolff III | 2005-02-07 20:34:53 | Re: Query optimizer 8.0.1 (and 8.0) |