Quick Links

Re: query slows down with more accurate stats

From:	Manfred Koizar <mkoi-pg(at)aon(dot)at>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: query slows down with more accurate stats
Date:	2004-04-16 10:16:11
Message-ID:	hjau701l6u8kavv7ukrqprugi2bkuocd2a@email.aon.at
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

On Thu, 15 Apr 2004 20:18:49 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> getting several tuples from the same page is more likely
>> than with the old method.
>
>Hm, are you sure?

Almost sure. Let's look at a corner case: What is the probability of
getting a sample with no two tuples from the same page? To simplify the
problem assume that each page contains the same number of tuples c.

If the number of pages is B and the sample size is n, a perfect sampling
method collects a sample where all tuples come from different pages with
probability (in OpenOffice.org syntax):

p = prod from{i = 0} to{n - 1} {{c(B - i)} over {cB - i}}

or in C:

p = 1.0;
for (i = 0; i < n; ++i)
p *= c*(B - i) / (c*B - i)

This probability grows with increasing B.

>Also, I'm not at all sure that the old method satisfies that constraint
>completely in the presence of nonuniform numbers of tuples per page,
>so we'd not necessarily be going backwards anyhow ...

Yes, it boils down to a decision whether we want to replace one not
quite perfect sampling method with another not quite perfect method.
I'm still working on putting together the pros and cons ...

Servus
Manfred

In response to

Re: query slows down with more accurate stats at 2004-04-16 00:18:49 from Tom Lane

Responses

Re: query slows down with more accurate stats at 2004-04-16 14:34:49 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrew Sullivan	2004-04-16 12:10:20	Re: Socket communication for contrib
Previous Message	Christopher Kings-Lynne	2004-04-16 02:41:10	Re: [HACKERS] Remove MySQL Tools from Source?

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Dirk Lutzebäck	2004-04-16 13:03:28	RESOLVED: Re: Wierd context-switching issue on Xeon
Previous Message	Rajesh Kumar Mallah	2004-04-16 08:23:50	Re: [ SOLVED ] select count(*) very slow on an already