From: | Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> |
---|---|
To: | Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Removing PD_ALL_VISIBLE |
Date: | 2013-01-17 09:55:07 |
Message-ID: | CABOikdPHOPZ5VV1dHFu+WczFX2esvPrRzTvMXNJyy2AM1Bz83w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jan 17, 2013 at 2:57 PM, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>wrote:
>
>
> There was considerable discussion after this (accessible through the
> archives link above), which I won't attempt to summarise.
>
>
I thought Robert made those comments after considerable discussions on
Jeff's approach. So he probably still stands by his objections or at least
not satisfied/seen the numbers.
Now that I look at the patch, I wonder if there is another fundamental
issue with the patch. Since the patch removes WAL logging for the VM set
operation, this can happen:
1. Vacuum kicks in and clears all dead tuples in a page and decides that
its all-visible
2. Vacuum WAL-logs the cleanup activity and marks the page dirty
3. Vacuum sets the visibility bit and marks the VM page dirty
4. Say the VM page gets written to the disk. The heap page is not yet
written neither the WAL log corresponding to the cleanup operation
5. CRASH
After recovery, the VM bit will remain set because the VM page got written
before the crash. But since heap page's cleanup WAL did not made to the
disk, those operations won't be replayed. The heap page will be left with
not-all-visible tuples in that case and its not a good state to be in.
The original code does not have this problem because the VM set WAL gets
written after the heap page cleanup WAL. So its guaranteed that the VM bit
will be set during recovery only if the cleanup WAL is replayed too (there
is more magic than what meets the eye and I think its not fully
documented).
Thanks,
Pavan
--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee
From | Date | Subject | |
---|---|---|---|
Next Message | Dave Page | 2013-01-17 10:09:19 | Re: review: pgbench - aggregation of info written into log |
Previous Message | Magnus Hagander | 2013-01-17 09:36:47 | Re: review: pgbench - aggregation of info written into log |