Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
Date: 2018-03-29 20:25:56
Message-ID: CANP8+jJXU1MgZ0gTd8oq7boQKTOsPwnOSMkbhsUcv-9Ao0arsA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29 March 2018 at 21:03, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> On 03/29/2018 08:02 PM, Robert Haas wrote:
>> On Thu, Mar 29, 2018 at 1:13 PM, Tomas Vondra
>> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>> On 03/29/2018 06:42 PM, Tom Lane wrote:
>>>> The long and the short of it is that this is a very dangerous-looking
>>>> proposal, we are at the tail end of a development cycle, and there are
>>>> ~100 other patches remaining in the commitfest that also have claims
>>>> on our attention in the short time that's left. If you're expecting
>>>> people to spend more time thinking about this now, I feel you're being
>>>> unreasonable.
>>>
>>> I agree.
>>
>> +1.
>>
>>>> Also, I will say it once more: this change DOES decrease robustness.
>>>> It's like blockchain without the chain aspect, or git commits without
>>>> a parent pointer. We are not only interested in whether individual
>>>> WAL records are valid, but whether they form a consistent series.
>>>> Cross-checking xl_prev provides some measure of confidence about that;
>>>> xl_curr offers none.
>>>
>>> Not sure.
>>>
>>> If each WAL record has xl_curr, then we know to which position the
>>> record belongs (after verifying the checksum). And we do know the size
>>> of each WAL record, so we should be able to deduce if two records are
>>> immediately after each other. Which I think is enough to rebuild the
>>> chain of WAL records.
>>>
>>> To defeat this, this would need to happen:
>>>
>>> a) the WAL record gets written to a different location
>>> b) the xl_curr gets corrupted in sync with (a)
>>> c) the WAL checksum gets corrupted in sync with (b)
>>> d) the record overwrites existing record (same size/boundaries)
>>>
>>> That seems very much like xl_prev.
>>
>> I don't think so, because this ignores, for example, timeline
>> switches, or multiple clusters accidentally sharing an archive
>> directory.
>>
>
> I'm curious? How does xl_prev prevents either of those issues (in a way
> that xl_curr would not)?
>
>> TBH, I'm not entirely sure how likely corruption would be under this
>> proposal in practice, but I think Tom's got the right idea on a
>> theoretical level. As he says, there's a reason why things like
>> block chains and git commits include something about the previous
>> record in what gets hashed to create the identifier for the new
>> record. It ties those things together in a way that doesn't happen
>> otherwise. If somebody proposed changing git so that the SHA
>> identifier for a commit didn't hash the commit ID for the previous
>> commit, that would break the security of the system in all kinds of
>> ways: for example, an adversary could maliciously edit an old commit
>> by just changing its SHA identifier and that of its immediate
>> descendent. No other commit IDs would change and integrity checks
>> would not fail -- there would be no easy way to notice that something
>> bad had happened.
>>
>
> That seems like a rather poor analogy, because we're not protected
> against such adversarial modifications with xl_prev either. I can damage
> the previous WAL record, update its checksum and you won't notice anything.
>
> My understanding is that xl_curr and xl_prev give us pretty much the
> same amount of information, except that xl_prev provides it explicitly
> while with xl_curr it's implicit and we need to deduce it.
>
>> Now, in our case, the chances of problems are clearly a lot more
>> remote because of the way that WAL records are packed into files.
>> Every WAL record has a place where it's supposed to appear in the WAL
>> stream, and positions in the WAL stream are never intentionally
>> recycled (except in PITR scenarios, I guess). Because of that, the
>> proposed xl_curr check provides a pretty strong cross-check on the
>> validity of a record even without xl_prev. I think there is an
>> argument to be made - as Simon is doing - that this check is in fact
>> strong enough that it's good enough for government work. He might be
>
> Is he? I think the claims in this thread were pretty much that xl_curr
> and xl_prev provide the same level of protection.

Yes, the blockchain analogy breaks down because we don't include
previous CRC, only xl_prev. The existing WAL format is easy enough to
repair, falsify etc. You've still got to keep your WAL data secure.

So the image of a CSI-style chain-of-evidence thing happening isn't
technical reality.

You could change the existing format to include the previous CRC in
the calculation of the current record's CRC. I'd be happy with that as
an option, just as much as I'd support an option for high performance
as I've proposed here. But bear in mind that including the CRC would
mean that you couldn't repair data damaged by data corruption because
it would mean all WAL after the corrupt page would be effectively
scrambled.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2018-03-29 20:26:30 Re: Enhance pg_stat_wal_receiver view to display connected host
Previous Message Robert Haas 2018-03-29 20:11:04 Re: [HACKERS] why not parallel seq scan for slow functions