Re: cheaper snapshots

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: cheaper snapshots
Date: 2011-07-28 08:16:41
Message-ID: 7CC1D2D9-1944-42B1-B4AA-D308D424A5A9@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jul28, 2011, at 04:51 , Robert Haas wrote:
> One fly in the ointment is that 8-byte
> stores are apparently done as two 4-byte stores on some platforms.
> But if the counter runs backward, I think even that is OK. If you
> happen to read an 8 byte value as it's being written, you'll get 4
> bytes of the intended value and 4 bytes of zeros. The value will
> therefore appear to be less than what it should be. However, if the
> value was in the midst of being written, then it's still in the midst
> of committing, which means that that XID wasn't going to be visible
> anyway. Accidentally reading a smaller value doesn't change the
> answer.

That only works if the update of the most-significant word is guaranteed
to be visible before the update to the lest-significant one. Which
I think you can only enforce if you update the words individually
(and use a fence on e.g. PPC32). Otherwise you're at the mercy of the
compiler.

Otherwise, the following might happen (with a 2-byte value instead of an
8-byte one, and the assumption that 1-byte stores are atomic while 2-bytes
ones aren't. Just to keep the numbers smaller. The machine is assumed to be
big-endian)

The counter is at 0xff00
Backends 1 decrements, i.e. does
(1) STORE [counter+1] 0xff
(2) STORE [counter], 0x00

Backend 2 reads
(1') LOAD [counter+1]
(2') LOAD [counter]

If the sequence of events is (1), (1'), (2'), (2), backend 2 will read
0xffff which is higher than it should be.

But we could simply use a spin-lock to protect the read on machines where
we don't know for sure that 64-bit reads and write are atomic. That'll
only really hurt on machines with 16+ cores or so, and the number of
architectures which support that isn't that high anyway. If we supported
spinlock-less operation on SPARC, x86-64, PPC64 and maybe Itanium, would we
miss any important one?

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2011-07-28 10:50:49 Re: cheaper snapshots
Previous Message Simon Riggs 2011-07-28 07:46:38 Re: cheaper snapshots