From: | "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com> |
---|---|
To: | Greg Stark <gsstark(at)mit(dot)edu> |
Cc: | josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Improving N-Distinct estimation by ANALYZE |
Date: | 2006-01-06 16:09:06 |
Message-ID: | 20060106160906.GH3902@pervasive.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jan 06, 2006 at 01:24:41AM -0500, Greg Stark wrote:
> > 5% based on block-based sampling is reasonable; that means a straight 5% of
> > the on-disk size of the table, so 5gb for a 100gb table. With random-row
> > sampling, that would require as much as 25% of the table, making it easier
> > to just scan the whole thing.
>
> Postgres's current sample sizes are clearly geared towards the histograms
> where they are entirely realistic. All of the distinct estimates are clearly
> just ad hoc attempts based on the existing sampling.
>
> Is a mechanism that is only 5x faster than reading the whole table (assuming
> random_page_cost of 4) and is off by more than a factor of three 10% of the
> time really worth it?
Before we start debating merits of proposals based on random reads, can
someone confirm that the sampling code actually does read randomly? I
looked at it yesterday; there is a comment that states that blocks to be
scanned are passed to the analyze function in physical order, and AFAICT
the function that chooses blocks does so based strictly on applying a
probability function to block numbers as it increments a counter. It
seems that any reading is actually sequential and not random, which
makes all the random_page_cost hand-waving null and void.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2006-01-06 16:18:15 | Re: could not access status of transaction 0 |
Previous Message | Stefan Kaltenbrunner | 2006-01-06 16:01:57 | could not access status of transaction 0 |