From: | Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation() |
Date: | 2018-02-12 11:03:51 |
Message-ID: | CABOikdPQwNKmtiwGVFnLhWih=kHGHJuu===Xh7LOki_4fS5J_A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Feb 2, 2018 at 9:07 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Feb 1, 2018 at 7:21 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > Yes, it would be about 99% of the time.
> >
> > But you have it backwards - we are not assuming that case. That is the
> > only case that has risk - the one where an old WAL record starts at
> > exactly the place the latest one stops. Otherwise the rest of the WAL
> > record will certainly fail the CRC check, since it will effectively
> > have random data in it, as you say.
>
> OK, I get it now. Thanks for explaining. I think I understand now
> why you think this problem can be solved just by controlling the way
> we recycle segments, but I'm still not sure if that can really be made
> fully reliable. Michael seems concerned about what might happen after
> multiple recyclings, and Tom has raised the issue of old data
> reappearing after a crash.
>
I'm not sure if Michael has spotted a real problem or was that just a
concern. He himself later rightly pointed out that when a WAL file is
switched, the old file is filled with zeros. So I don't see a problem
there. May be I am missing something and Michael can explain further.
Regarding Tom's concerns, that could be a problem if a file system crash
survives a name change, but not the subsequent data written to the file.
For this to be a problem, WAL file A is renamed to B and then renamed to C.
File A and C share the same low order bits. Further upon file system crash,
the file is correctly named as C, but the data written *before* the rename
operation is lost. Is that a real possibility? Can we delay reusing low
order bits a little further to address this problem? Of course, if the file
system crash can survive many renames and still resurrect old data several
renames before, then we shall have the same problem.
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Arthur Zakirov | 2018-02-12 11:49:45 | Re: [HACKERS] Bug in to_timestamp(). |
Previous Message | Pavan Deolasee | 2018-02-12 08:26:46 | Re: [HACKERS] MERGE SQL Statement for PG11 |