From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com> |
Cc: | Greg Stark <stark(at)mit(dot)edu>, Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: ANALYZE sampling is too good |
Date: | 2013-12-09 18:47:16 |
Message-ID: | 24417.1386614836@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Josh Berkus <josh(at)agliodbs(dot)com> writes:
> Reading 5% of a 200GB table is going to be considerably faster than
> reading the whole thing, if that 5% is being scanned in a way that the
> FS understands.
Really? See the upthread point that reading one sector from each track
has just as much seek overhead as reading the whole thing. I will grant
that if you think that reading a *contiguous* 5% of the table is good
enough, you can make it faster --- but I don't believe offhand that
you can make this better without seriously compromising the randomness
of your sample. Too many tables are loaded roughly in time order, or
in other ways that make contiguous subsets nonrandom.
> You do seem kind of hostile to the idea of full-page-sampling, going
> pretty far beyond the "I'd need to see the math". Why?
I'm detecting a lot of hostility to assertions unsupported by any math.
For good reason.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2013-12-09 18:50:26 | Re: Extra functionality to createuser |
Previous Message | Robert Haas | 2013-12-09 18:45:08 | Re: ANALYZE sampling is too good |