From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Simon Riggs <simon(at)2ndQuadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Moving more work outside WALInsertLock |
Date: | 2011-12-16 12:07:19 |
Message-ID: | 4EEB3477.4080502@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 16.12.2011 05:27, Tom Lane wrote:
> * We write a WAL record that starts 8 bytes before a sector boundary,
> so that the prev_link is in one sector and the rest of the record in
> the next one(s).
prev-link is not the first field in the header. The CRC is.
> * Time passes, and we recycle that WAL file.
>
> * We write another WAL record that starts 8 bytes before the same sector
> boundary, so that the prev_link is in one sector and the rest of the
> record in the next one(s).
>
> * System crashes, after having written out the earlier sector but not
> the later one(s).
>
> On restart, the replay code will see a prev_link that matches what it
> expects. If the CRC for the remainder of the record is not dependent
> on the prev_link, then the remainder of the old record will look good
> too, and we'll attempt to replay it, n*16MB too late.
The CRC would be in the previous sector with the prev-link, so the CRC
of the old record would have to match the CRC of the new record. I guess
that's not totally impossible, though - there could be some WAL-logged
operations where the payload of the WAL record is often exactly the
same. Like a heap clean record, when the same page is repeatedly pruned.
> Including the prev_link in the CRC adds a significant amount of
> protection against such problems. We should not remove this protection
> in the name of shaving a few cycles.
Yeah. I did some quick testing with a patch to leave prev-link out of
the calculation, and move the record CRC calculation outside the lock,
too. I don't remember the numbers, but while it did make some
difference, it didn't seem worthwhile.
Anyway, I'm looking at ways to make the memcpy() of the payload happen
without the lock, in parallel, and once you do that the record header
CRC calculation can be done in parallel, too. That makes it irrelevant
from a performance point of view whether the prev-link is included in
the CRC or not.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2011-12-16 12:07:52 | Re: ALTER TABLE lock strength reduction patch is unsafe |
Previous Message | Marti Raudsepp | 2011-12-16 11:07:41 | Re: [PATCH] Caching for stable expressions with constant arguments v3 |