From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Peter Geoghegan <pg(at)heroku(dot)com> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Greg Stark <stark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: ANALYZE sampling is too good |
Date: | 2013-12-11 02:03:52 |
Message-ID: | 16278.1386727432@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Peter Geoghegan <pg(at)heroku(dot)com> writes:
> Again, it isn't as if the likely efficacy of *some* block sampling
> approach is in question. I'm sure analyze.c is currently naive about
> many things.
It's not *that* naive; this is already about a third-generation algorithm.
The last major revision (commit 9d6570b8a4) was to address problems with
misestimating the number of live tuples due to nonuniform tuple density
in a table. IIRC, the previous code could be seriously misled if the
first few pages in the table were significantly non-representative of the
live-tuple density further on. I'm not sure how we can significantly
reduce the number of blocks examined without re-introducing that hazard in
some form. In particular, given that you want to see at least N tuples,
how many blocks will you read if you don't have an a-priori estimate of
tuple density? You have to decide that before you start sampling blocks,
if you want all blocks to have the same probability of being selected
and you want to read them in sequence.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Janes | 2013-12-11 02:11:45 | Re: ANALYZE sampling is too good |
Previous Message | Etsuro Fujita | 2013-12-11 01:40:04 | Re: Get more from indices. |