Weak-memory specific problem in ResetLatch/WaitLatch (follow-up analysis)

From: Michael Tautschnig <mt(at)debian(dot)org>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: Jade Alglave <jade(dot)alglave(at)cs(dot)ox(dot)ac(dot)uk>, Vincent Nimal <vincent(dot)nimal(at)balliol(dot)ox(dot)ac(dot)uk>, Daniel Kroening <kroening(at)cs(dot)ox(dot)ac(dot)uk>
Subject: Weak-memory specific problem in ResetLatch/WaitLatch (follow-up analysis)
Date: 2012-02-29 15:18:54
Message-ID: 20120229151854.GJ92164@l04.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

[Bcc'ed Tom Lane as he had done the initial investigation on this.]

Following up on the earlier discussions in

[1] http://archives.postgresql.org/pgsql-hackers/2010-11/msg01575.php
and
[2] http://archives.postgresql.org/pgsql-hackers/2011-08/msg00330.php
with an initial fix in
[3] http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=4e15a4db5e65e43271f8d20750d6500ab12632d0

we (a group of researchers from Oxford and London) did a more formal analysis
using a chain of automated verification tools. This suite of tools enables us to
automatically check for presence or absence of bugs on (smaller) bits of
software when run under weak memory model semantics, as is the case on x86 or
PowerPC. [In this context we are always interested in further source code
examples that make significant use of shared-memory concurrency on such
platforms.]

Using our tool chain, we were eager to (1) confirm the bug and (2) validate the
proposed fix. See the very end for our analysis of the proposed fix.

The example that Tom Lane initially posted in [2] can be further simplified to
the following bit of self-contained (C) code (with line numbers), where
WaitLatch is a simple busy wait as "while(!latch[.]);" and ResetLatch is just
"latch[.]=0;" Then running two of these worker functions in parallel suffices
to reproduce the problem that was initially reported.

1 #define WORKERS 2
2 volatile _Bool latch[WORKERS];
3 volatile _Bool flag[WORKERS];
4
5 void worker(int i)
6 {
7 while(!latch[i]);
8 for(;;)
9 {
10 assert(!latch[i] || flag[i]);
11 latch[i] = 0;
12 if(flag[i])
13 {
14 flag[i] = 0;
15 flag[i+1 % WORKERS] = 1;
16 latch[i+1 % WORKERS] = 1;
17 }
18
19 while(!latch[i]);
20 }
21 }

We developed a tool that is able to analyse a piece of concurrent C code and
determines whether it contains bugs. Our tool confirms the presence of at
least two bugs in the piece of code under discussion.

The first problem corresponds to a message passing idiom (view with fixed-width
font; the info in front of each statement states the line number each
worker/thread is executing):

Worker 0 | Worker 1
===================+========================
(0:15) flag[1]=1; | (1:7) while (!latch[1]);
(0:16) latch[1]=1; | (1:12) if (flag[1])

where we can observe latch[1] holding 1 and flag[1] holding 0 in the end.

This behaviour can happen on the PowerPC architecture for three reasons. First,
write-write pairs can be reordered, for example the write to flag and the write
to latch by Worker 0. Second, read-read pairs can be reordered, for example the
read of the while condition and the read of the if condition by Worker 1.
Finally, the store to latch by Worker 0 might not be atomic.

This corresponds to a scenario where we first execute Worker 0 up to line 17.
All the reads and writes go directly to memory, except the assignments at lines
14 and 15, where the values 0 and 1 are stored respectively into the
processor-local buffers of flag[0] and flag[1]. Then the second worker starts,
reading the freshly updated value 1 of latch[1]. It then exits the blocking
while (line 7). But latch[1] still holds 1, and flag[1] is still holding 0, as
Worker 0 has not flushed yet the write of 1 waiting in its buffer. This means
that the condition of the if statement would not be true, the critical section
would be skipped, and the program would arrive at line 19, without having
authorised the next worker to enter in critical section, and would loop forever.

This seems to correspond to the scenario discussed in [2] above, where the wait
in line 19 times out. This is confirmed by the fact that this behaviour is
commonly observed on several generations of Power machines (e.g. 1.7G/167G on
Power 7).

Let us now focus on the second bug; it corresponds to a load buffering idiom

Worker 0 | Worker 1
=====================+========================
(0:12) if (flag[0]) | (1:12) if (flag[1])
(0:15) flag[1]=1; | (1:15) flag[0]=1;

where we can observe both flag[0] and flag[1] holding 1 in the end.

This behaviour is valid on the PowerPC architecture for two reasons. First,
PowerPC can reorder read-write pairs, hence the read of the if condition on
either thread can be done after setting the flag to 1. Second, the fact
that PowerPC relaxes the atomicity of stores is another reason for this to
happen. Our understanding is that this behaviour is not yet implemented by
real-world Power machines. Yet, since it is documented to be permitted by
the architecture, this could lead to actual bugs on future generations of
Power machines if not synchronised correctly.

In [3] it was suggested to fix the problem by placing a barrier in ResetLatch,
which corresponds to placing it between lines 11 and 12 in the code above. This
amounts to placing a barrier between the two reads (lines 7/19 and 12, i.e.,
between WaitLatch and the if(flag[1]) ) of Worker 1.

Placing a sync (i.e., the strongest Power barrier) accordingly would, however,
still be insufficient for the second problem, as it would only fix the
reordering of read-read pairs by Worker 1 and the store atomicity issue from
Worker 0. But the writes on Worker 0 could still be reordered (problem number
2). One possible fix consists of placing a sync between the two writes on Worker
0, and an address dependency between the two reads on Worker 1. Clearly,
however, these are changes that cannot any longer be hidden behind the
ResetLatch/WaitLatch interface, but rather go in the code using these.

Please let us know if we got into too much detail here and if you have any
questions, or would like further discussion/validation of potential fixes!

Best,
Michael

(on behalf of the team: Jade Alglave, Daniel Kroening, Vincent Nimal, Michael
Tautschnig, as CC'ed)

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2012-02-29 15:30:12 Re: 16-bit page checksums for 9.2
Previous Message Simon Riggs 2012-02-29 15:01:37 Re: 16-bit page checksums for 9.2