From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | Magnus Hagander <magnus(at)hagander(dot)net>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Online enabling of checksums |
Date: | 2018-04-06 17:22:06 |
Message-ID: | 20180406172206.hplj42fwzoas6xi3@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2018-04-06 14:34:43 +0200, Tomas Vondra wrote:
> > Oh, that's not my intention either -- I just wanted to make sure I
> > was thinking about the same issue you were.
> I agree we shouldn't rely on chance here - if we might read a stale
> value, we need to fix that of course.
It's perfectly possible that some side-conditions mitigate this. What
concerns me that
a) Nobody appears to have raised this issue beforehand, besides an
unlocked read of a critical variable being a fairly obvious
issue. This kind of thing needs to be carefully thought about.
b) If there's some "side channel" interlock, it's not documented.
I noticed the issue because of an IM question about the general feature,
and I did a three minute skim and saw the read without a comment.
> I'm not quite sure I fully understand the issue, though. I assume both
> LockBufHdr and UnlockBufHdr are memory barriers, so for bad things to
> happen the process would need to be already past LockBufHdr when the
> checksum version is updated. In which case it can use a stale version
> when writing the buffer out. Correct?
Yes, they're are memory barriers.
> I wonder if that's actually a problem, considering the checksum worker
> will then overwrite all data with correct checksums anyway. So the other
> process would have to overwrite the buffer after checksum worker, at
> which point it'll have to go through LockBufHdr.
Again, I'm not sure if there's some combination of issues that make this
not a problem in practice. I just *asked* if there's something
preventing this from being a problem.
The really problematic case would be if it is possible for some process
to wait long enough, without executing a barrier implying operation,
that it'd try to write out a page that the checksum worker has already
passed over.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2018-04-06 17:22:34 | Re: WIP: Covering + unique indexes. |
Previous Message | Teodor Sigaev | 2018-04-06 17:20:21 | Re: WIP: Covering + unique indexes. |