From: | Richard Huxton <dev(at)archonet(dot)com> |
---|---|
To: | Greg Stark <gsstark(at)mit(dot)edu> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Block-level CRC checks |
Date: | 2009-12-01 22:46:32 |
Message-ID: | 4B159CC8.9030201@archonet.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Greg Stark wrote:
> On Tue, Dec 1, 2009 at 9:57 PM, Richard Huxton <dev(at)archonet(dot)com> wrote:
>> Why are we writing out the hint bits to disk anyway? Is it really so
>> slow to calculate them on read + cache them that it's worth all this
>> trouble? Are they not also to blame for the "write my import data twice"
>> feature?
>
> It would be interesting to experiment with different strategies. But
> the results would depend a lot on workloads and I doubt one strategy
> is best for everyone.
>
> It has often been suggested that we could set the hint bits but not
> dirty the page, so they would never be written out unless some other
> update hit the page. In most use cases that would probably result in
> the right thing happening where we avoid half the writes but still
> stop doing transaction status lookups relatively promptly. The scary
> thing is that there might be use cases such as static data loaded
> where the hint bits never get set and every scan of the page has to
> recheck those statuses until the tuples are frozen.
And how scary is that? Assuming we cache the hints...
1. With the page itself, so same lifespan
2. Separately, perhaps with a different (longer) lifespan.
Separately would then let you trade complexity for compactness - "all of
block B is deleted", "all of table T is visible".
So what is the cost of calculating the hint-bits for a whole block of
tuples in one go vs reading that block from actual spinning disk?
> There does need to be something like the hint bits which does
> eventually have to be set because we can't keep transaction
> information around forever. Even if you keep the transaction
> information all the way back to the last freeze date (up to about 1GB
> and change I think) then the data has to be written twice, the second
> time is to freeze the transactions. In the worst case then reading a
> page requires a random page access (or two) from anywhere in that 1GB+
> file for each tuple on the page (whether visible to us or not).
While on that topic - I'm assuming freezing requires substantially more
effort than updating hint bits?
--
Richard Huxton
Archonet Ltd
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2009-12-01 22:47:56 | Re: Block-level CRC checks |
Previous Message | Greg Smith | 2009-12-01 22:40:06 | Re: [CORE] EOL for 7.4? |