From: | 高增琦 <pgf00a(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, Jesper Krogh <jesper(at)krogh(dot)cc>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: crash-safe visibility map, take four |
Date: | 2011-03-31 08:33:36 |
Message-ID: | AANLkTin+hWh8QE83XjN9J1br4Qn7_qYwQY-vGWA-nduQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Mar 30, 2011 at 8:52 PM, Heikki Linnakangas <
heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 30.03.2011 06:24, 高增琦 wrote:
>
>> Should we do full-page write for visibilitymap all the time?
>> Now, when clear visiblitymap, there is no full-page write for vm
>> since we don't save buffer info in insert/update/delete's log.
>>
>> The full-page write is used to protect pages from disk failure. Without
>> it,
>> 1) set vm: the vm bits that should be set to 1 may still be 0
>> 2) clear vm: the vm bits that should be set to 0 may still be 1
>> Are these true? Or the page is totally unpredictable?
>>
>
> Not quite. The WAL replay will set or clear vm bits, regardless of full
> page writes. Full page writes protect from torn pages, ie. the problem where
> some operations on a page have made it to disk while others have not. That's
> not a problem for VM pages, as each bit on the page can be set or cleared
> individually. But for something like a heap page where you have an offset in
> the beginning of the page that points to the tuple elsewhere on the page,
> you have to ensure that they stay in sync, even if you don't otherwise care
> if the update makes it to disk or not.
>
>
Consider a example:
1. delete on two pages, emits two log (1, page1, vm_clear_1), (2, page2,
vm_clear_2)
2. "vm_clear_1" and "vm_clear_2" on same vm page
3. checkpoint, and vm page get torned, vm_clear_2 was lost
4. delete another page, emits one log (3, page1, vm_clear_3), vm_clear_3
still on that vm page
5. power down
6. startup, redo will replay all change after checkpoint, but vm_clear_2
will never be cleared
Am I right?
> Another question:
>> To address the problem in
>> http://archives.postgresql.org/pgsql-hackers/2010-02/msg02097.php
>> , should we just clear the vm before the log of insert/update/delete?
>> This may reduce the performance, is there another solution?
>>
>
> Yeah, that's a straightforward way to fix it. I don't think the performance
> hit will be too bad. But we need to be careful not to hold locks while doing
> I/O, which might require some rearrangement of the code. We might want to do
> a similar dance that we do in vacuum, and call visibilitymap_pin first, then
> lock and update the heap page, and then set the VM bit while holding the
> lock on the heap page.
>
>
Do you mean we should lock the heap page first, then get the blocknumber,
then release heap page,
then pin the vm's page, then lock both heap page and vm page?
As Robert Haas said, when lock the heap page again, may there isnot enough
free space on it.
Is there a way just stop the checkpoint for a while?
Thanks.
GaoZengqi
From | Date | Subject | |
---|---|---|---|
Next Message | 高增琦 | 2011-03-31 08:46:10 | Re: crash-safe visibility map, take four |
Previous Message | Heikki Linnakangas | 2011-03-31 07:20:32 | Re: Replication server timeout patch |