From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Gregory Stark <stark(at)enterprisedb(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, "jd(at)commandprompt(dot)com" <jd(at)commandprompt(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: benchmarking the query planner |
Date: | 2008-12-12 16:40:57 |
Message-ID: | 1229100057.8673.29.camel@ebony.2ndQuadrant |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 2008-12-12 at 16:10 +0000, Gregory Stark wrote:
> Right, but increasing our sample size by a factor of 150 for a 100M
> row table doesn't seem like a reasonable solution to one metric being
> bogus.
>
> For that matter, if we do consider sampling 5% of the table we may as
> well just go ahead and scan the whole table. It wouldn't take much
> longer and it would actually produce good estimates.
As I said, we would only increase sample for ndistinct, not for others.
At the moment we completely and significantly fail to assess ndistinct
correctly on clustered data for large tables. Using block level sampling
would prevent that. Right now we may as well use a random number
generator.
The amount of I/O could stay the same, just sample all rows on block.
Lifting the sample size will help large tables. Will it be perfect? No.
But I'll take "better" over "not working at all".
If we are going to quote literature we should believe all the
literature. We can't just listen to some people that did a few tests
with sample size, but then ignore the guy that designed the MS optimizer
and many others.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2008-12-12 16:43:01 | Re: benchmarking the query planner |
Previous Message | Robert Haas | 2008-12-12 16:35:57 | Re: benchmarking the query planner |