Re: Block-level CRC checks

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Richard Huxton <dev(at)archonet(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block-level CRC checks
Date: 2009-12-01 22:13:20
Message-ID: 407d949e0912011413u132ace3doc651662edbd1ed41@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 1, 2009 at 9:57 PM, Richard Huxton <dev(at)archonet(dot)com> wrote:
> Why are we writing out the hint bits to disk anyway? Is it really so
> slow to calculate them on read + cache them that it's worth all this
> trouble? Are they not also to blame for the "write my import data twice"
> feature?

It would be interesting to experiment with different strategies. But
the results would depend a lot on workloads and I doubt one strategy
is best for everyone.

It has often been suggested that we could set the hint bits but not
dirty the page, so they would never be written out unless some other
update hit the page. In most use cases that would probably result in
the right thing happening where we avoid half the writes but still
stop doing transaction status lookups relatively promptly. The scary
thing is that there might be use cases such as static data loaded
where the hint bits never get set and every scan of the page has to
recheck those statuses until the tuples are frozen.

(Not dirtying the page almost gets us out of the CRC problems -- it
doesn't in our current setup because we don't take a lock when setting
the hint bits, so you could set it on a page someone is in the middle
of CRC checking and writing. There were other solutions proposed for
that, including just making hint bits require locking the page or
double buffering the write.)

There does need to be something like the hint bits which does
eventually have to be set because we can't keep transaction
information around forever. Even if you keep the transaction
information all the way back to the last freeze date (up to about 1GB
and change I think) then the data has to be written twice, the second
time is to freeze the transactions. In the worst case then reading a
page requires a random page access (or two) from anywhere in that 1GB+
file for each tuple on the page (whether visible to us or not).
--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-12-01 22:14:25 Re: [CORE] EOL for 7.4?
Previous Message Bruce Momjian 2009-12-01 22:12:13 Re: Block-level CRC checks