Quick Links

Re: Large DB

From:	Manfred Koizar <mkoi-pg(at)aon(dot)at>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Mooney, Ryan" <ryan(dot)mooney(at)pnl(dot)gov>, pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Large DB
Date:	2004-04-02 18:05:01
Message-ID:	2r6r60h049h0lg4s9ve3qe1h38ubprpo30@email.aon.at
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-hackers

[time to move this to -hackers]

On Fri, 02 Apr 2004 11:16:21 -0500, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>Manfred Koizar <mkoi-pg(at)aon(dot)at> writes:
>> The first step, however, (acquire_sample_rows() in analyze.c) has to
>> read more rows than finally end up in the sample. It visits less than
>> O(nblocks) pages but certainly more than O(1).
>
>> A vague feeling tries to tell me that the number of page reads is
>> somehow related to the harmonic numbers 1 + 1/2 + 1/3 + ... + 1/n, which
>> grow like O(ln(n)).
>
>Good guess. Vitter's paper says the expected time to sample n rows from
>a table of size N is O(n * (1 + log(N/n))).

Well, for what I tried to find out my wild guess seems to be wrong.

I don't doubt that Vitter's formula is correct, but it assumes that
access to any tuple has the same cost. This does not apply to our
problem, however. With 100 tuples per page, we access the first
sample_size tuples at a cost of 0.01 sequential page reads per tuple.
Later we use less and less tuples per page which results in higher
per-tuple-cost. Near the end of a large relation we can expect to
access only one tuple per page and more and more pages are skipped, so
that prefetching doesn't help any more.

Playing around with some real numbers (for 100 tuples/page and a sample
size of 3000) I got:

rel | page
size | reads
------+-------------
30 | 30
300 | 300 expectation is something like 299.9995
500 | 499
1K | 990
3K | 2.6K
30K | 8K
100K | 12K
1M | 19K
10M | 26K
100M | 33K

This growth rate is steeper than O(log(nblocks)).

>> I have an idea how this could be done with O(1) page reads.

What I have in mind is a kind of "Double Vitter" algorithm. Whatever we
do to get our sample of rows, in the end the sampled rows come from no
more than sample_size different blocks. So my idea is to first create a
random sample of sample_size block numbers, and then to sample the rows
out of this pool of blocks.

I have to think harder though, what to do about those 400 pages that are
not accessed when the sample size is 3000 ...

>The hard part is getting a genuinely random sample when we don't know N
>in advance. We do however know the table size in blocks, so if you're
>willing to make assumptions about constant tuple density you could do
>something different. (But the tuple density assumption is exactly the
>weak spot of what we've got, so I'm unconvinced that would be a big step
>forward.)

Starting the scan at some random blocks should help against the common
case of unusual distribution of dead tuples near the start of the
relation. And I plan to factor information about dead tuple hits into
an increasingly better estimation of dead/live tuple ratio.

Servus
Manfred

In response to

Re: Large DB at 2004-04-02 16:16:21 from Tom Lane

Responses

Re: [GENERAL] Large DB at 2004-04-02 19:48:13 from Tom Lane

Browse pgsql-general by date

	From	Date	Subject
Next Message	John DeSoi	2004-04-02 18:27:38	Re: row-level security model
Previous Message	Randall Skelton	2004-04-02 17:18:47	Storage cost of a null column

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Joe Conway	2004-04-02 18:08:20	Re: Better support for whole-row operations and composite
Previous Message	Robert Turnbull	2004-04-02 17:43:38	Prepared select