From: | Greg Stark <stark(at)mit(dot)edu> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Moving more work outside WALInsertLock |
Date: | 2011-12-24 19:41:04 |
Message-ID: | CAM-w4HO=mdWTNhZfCRYGtAjdzHd-MT8JWFqBsPNZn0OcAv47_w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Dec 16, 2011 at 3:27 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> On its own that sounds dangerous, but its not. When we need to confirm
>> the prev link we already know what we expect it to be, so CRC-ing it
>> is overkill. That isn't true of any other part of the WAL record, so
>> the prev link is the only thing we can relax, but thats OK because we
>> can CRC check everything else outside of the locked section.
>
>> That isn't my idea, but I'm happy to put it on the table since I'm not shy.
>
> I'm glad it's not your idea, because it's a bad one.
I'll take the blame or credit here.
> A large part of
> the point of CRC'ing WAL records is to guard against torn-page problems
> in the WAL files, and doing things like that would give up a significant
> part of that protection, because there would no longer be any assurance
> that the body of a WAL record had anything to do with its prev_link.
Hm, I hadn't considered the possibility of a prev_link being the only
thing left over from a torn page. As Heikki pointed out having the CRC
and the rest of the record on opposite sides of the prev_link does
seem like convincing protection but it's a lot more fiddly and hard to
explain the dependencies this way.
Another thought that was discussed in the same dinner was separating
the CRC into a separate record that would cover all the WAL since the
last CRC. These would only need to be emitted when there's a WAL sync,
not on every record. I think someone showed some benchmarks claiming
that a significant overhead with the CRC was the startup and finishing
time for doing lots of small chunks. If it processes larger blocks it
might be able to make more efficient use of the memory bandwidth. I'm
not entirely convinced of that myself but it bears some
experimentation.
--
greg
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Stark | 2011-12-24 20:06:52 | Re: 16-bit page checksums for 9.2 |
Previous Message | Greg Stark | 2011-12-24 19:26:16 | Re: reprise: pretty print viewdefs |