From: | Martijn van Oosterhout <kleptog(at)svana(dot)org> |
---|---|
To: | Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: ANALYZE sampling is too good |
Date: | 2013-12-11 22:39:04 |
Message-ID: | 20131211223904.GA13377@svana.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Dec 12, 2013 at 07:22:59AM +1300, Gavin Flower wrote:
> Surely we want to sample a 'constant fraction' (obviously, in
> practice you have to sample an integral number of rows in a page!)
> of rows per page? The simplest way, as Tom suggests, is to use all
> the rows in a page.
>
> However, if you wanted the same number of rows from a greater number
> of pages, you could (for example) select a quarter of the rows from
> each page. In which case, when this is a fractional number: take
> the integral number of rows, plus on extra row with a probability
> equal to the fraction (here 0.25).
In this discussion we've mostly used block = 1 postgresql block of 8k.
But when reading from a disk once you've read one block you can
basically read the following ones practically for free.
So I wonder if you could make your sampling read always 16 consecutive
blocks, but then use 25-50% of the tuples. That way you get many more
tuples for the same amount of disk I/O seeks..
Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
-- Arthur Schopenhauer
From | Date | Subject | |
---|---|---|---|
Next Message | satoshi yamada | 2013-12-11 22:46:25 | Re: Why standby.max_connections must be higher than primary.max_connections? |
Previous Message | Greg Stark | 2013-12-11 22:33:40 | Re: ANALYZE sampling is too good |