| From: | Manfred Koizar <mkoi-pg(at)aon(dot)at> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | pgsql-patches(at)postgresql(dot)org |
| Subject: | Re: O(samplesize) tuple sampling, proof of concept |
| Date: | 2004-04-05 22:23:19 |
| Message-ID: | s4j370l6ra56tvodcbg9baaf682q6cu3pn@email.aon.at |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-patches |
On Mon, 05 Apr 2004 15:37:07 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>I wouldn't bother with a GUC variable for the production patch.
Among other things the GUC variable will be thrown out for the final
version.
>> Once a block is selected for inspection, all tuples of this
>> block are accessed to get a good estimation of the live : dead row
>> ratio.
>
>Why are you bothering to compute the live:dead ratio?
That was basically bad wording. It should have been "... to get a good
estimation of live rows per page." Counting dead rows turned out to be
trivial, so I just did it and included the number in the debug messages.
Then it happened to be useful for method 2.
>> Because I was afraid that method 1 might be too expensive in terms of
>> CPU cycles, I implemented a small variation that skips tuples without
>> checking them for visibility; this is sampling_method 2.
>
>And? Does it matter?
There's a clearly visible difference for mid-size relations. I'm not
sure whether this can be attributed to visibility bit updating or other
noise-contributing factors.
Method 2 gives a row count estimation error between 10 and 17% where
method 1 is less than 1% off. (My updates generated dead tuples at very
evenly distributed locations by using conditions like WHERE mod(twenty,
7) = 0).
>If that's as bad as it gets I think we are OK. You should redo the test
>with larger sample size though (try stats target = 100) to see if the
>answer changes.
Will do.
>I find -u diffs close to unreadable for reviewing purposes. Please
>submit diffs in -c format in future.
De gustibus non est disputandum :-)
Fortunately this patch wouldn't look much different. There is just a
bunch of "+" lines.
>AFAICS the rows will *always* be sorted already, and so this qsort is an
>utter waste of cycles. No?
I don't think so. We get the *blocks* in the correct order. But tuples
are still sampled by the Vitter method which starts to replace random
tuples after the pool is filled.
BTW, thanks for the paper!
Servus
Manfred
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2004-04-05 22:30:29 | Re: O(samplesize) tuple sampling, proof of concept |
| Previous Message | Tom Lane | 2004-04-05 19:37:07 | Re: O(samplesize) tuple sampling, proof of concept |