Quick Links

Re: [GENERAL] Large DB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Manfred Koizar <mkoi-pg(at)aon(dot)at>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [GENERAL] Large DB
Date:	2004-04-03 00:57:47
Message-ID:	10036.1080953867@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-hackers

Manfred Koizar <mkoi-pg(at)aon(dot)at> writes:
>> You'd run the Vitter
>> algorithm separately to decide whether to keep or discard each live row
>> you find in the blocks you read.

> You mean once a block is sampled we inspect it in any case? This was
> not the way I had planned to do it, but I'll keep this idea in mind.

Well, once we've gone to the trouble of reading in a block we
definitely want to count the tuples in it, for the purposes of
extrapolating the total number of tuples in the relation. Given
that, I think the most painless route is simply to use the Vitter
algorithm with the number-of-tuples-scanned as the count variable.
You could dump the logic in acquire_sample_rows that tries to estimate
where to read the N'th tuple from.

If you like I can send you the Vitter paper off-list (I have a PDF of
it). The comments in the code are not really intended to teach someone
what it's good for ...

regards, tom lane

In response to

Re: [GENERAL] Large DB at 2004-04-03 00:40:39 from Manfred Koizar

Responses

Re: [GENERAL] Large DB at 2004-04-03 01:09:51 from Manfred Koizar

Browse pgsql-general by date

	From	Date	Subject
Next Message	Manfred Koizar	2004-04-03 01:09:51	Re: [GENERAL] Large DB
Previous Message	Manfred Koizar	2004-04-03 00:54:31	Re: Casting int to bool with join...

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2004-04-03 01:06:34	Re: Function to kill backend
Previous Message	Jim Seymour	2004-04-03 00:52:45	Re: Problems Vacuum'ing