| From: | Greg Stark <gsstark(at)mit(dot)edu> |
|---|---|
| To: | josh(at)agliodbs(dot)com |
| Cc: | pgsql-hackers(at)postgresql(dot)org, Greg Stark <gsstark(at)mit(dot)edu>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com> |
| Subject: | Re: Improving N-Distinct estimation by ANALYZE |
| Date: | 2006-01-06 23:36:52 |
| Message-ID: | 87psn52ajv.fsf@stark.xeocode.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Josh Berkus <josh(at)agliodbs(dot)com> writes:
> > These numbers don't make much sense to me. It seems like 5% is about as
> > slow as reading the whole file which is even worse than I expected. I
> > thought I was being a bit pessimistic to think reading 5% would be as
> > slow as reading 20% of the table.
>
> It's about what *I* expected. Disk seeking is the bane of many access
> methods.
Sure, but that bad? That means realistic random_page_cost values should be
something more like 20 rather than 4. And that's with seeks only going to
subsequent blocks in a single file, which one would expect to average less
than the half rotation that a random seek would average. That seems worse than
anyone expects.
> Anyway, since the proof is in the pudding, Simon and I will be working on
> some demo code for different sampling methods so that we can debate
> results rather than theory.
Note that if these numbers are realistic then there's no i/o benefit to any
sampling method that requires anything like 5% of the entire table and is
still unreliable. Instead it makes more sense to implement an algorithm that
requires a full table scan and can produce good results more reliably.
--
greg
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Marko Kreen | 2006-01-06 23:44:12 | Re: [HACKERS] Inconsistent syntax in GRANT |
| Previous Message | Tom Lane | 2006-01-06 23:36:43 | Re: [HACKERS] Inconsistent syntax in GRANT |