Re: Ideas for improving Concurrency Tests

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Greg Stark'" <stark(at)mit(dot)edu>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Ideas for improving Concurrency Tests
Date: 2013-03-27 08:36:37
Message-ID: 001501ce2ac6$3296fc10$97c4f430$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, March 26, 2013 9:49 PM Greg Stark wrote:
> On Tue, Mar 26, 2013 at 7:31 AM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
> wrote:
> > Above ideas could be useful to improve concurrency testing and can
> also be
> > helpful to generate test cases for some of the complicated bugs for
> which
> > there is no direct test.
>
> I wonder how much explicit sync points would help with testing though.
> It seems like they suffer from the problem that you'll only put sync
> points where you actually expect problems and not where you don't
> expect them -- which is exactly where problems are likely to occur.

We can do it for different kind of operations. For example:
1. All the operations which are done in Phase:
a. Create Index Concurrently - Some time back, I was going through the
design of Create Index Concurrently and I found a problem
which I reported in mail below:

http://www.postgresql.org/message-id/006801cdb72e$96b62330$c4226990$@kapila@
huawei.com
It occurs because we change design/implementation for
RelationGetIndexList() to address Drop Index Concurrently.
Such issues are sometimes difficult to catch through normal tests.
However if we have defined sync points for each phase
and its dependent operations, it would be comparatively easier to
catch if any change occurs.
It could have been caught if we could define sync points for step-3
and step-4 as mentioned in mail.

b. Alter Table - In this also we do the operation in 3 phases, so we can
define sync points between each phase and its dependent ops.

2. Some time back, one defect is fixed for concurrency between insert
cleaning the btree page and vacuum,
Commit log:
http://www.postgresql.org/message-id/E1Rzvx1-0005nB-1p@gemulon.postgresql.or
g
Even if such synchronization points are difficult to think ahead, we
can protect their breakage later on by some other change by having test case
for them.
Such tests would also need sync points.

> Wouldn't it be more useful to implicitly create sync points whenever
> synchronization events like spinlocks being taken occur?

It will be really useful, but how in such cases will we make sure from test
case that what action (WAIT, SIGNAL or IGNORE) to take on sync point. For
example

S-1
Insert into tbl values(1);
S-2
Select * from tbl;

If both S-1,S-2 run parallel, it could be difficult to say weather '1' will
be visible to S-2.

However if S-2 waits for signal in GetSnapshotData() before taking
ProcArrayLock, and S-1 sets the signal after release of ProcArrayLock in
function ProcArrayEndTransaction,
S-2 can expect to see value '1'.

For above test, how will we make sure that only S-2 should wait in
GetSnapshotData not S-2?

Could you elaborate bit more, may be I am not getting your point completely?

> And likewise explicitly listing the timing sequences to test seems
> unconvincing. If we could arrange for two threads to execute every
> possible interleaving of code by exhaustively trying every combination
> that would be far more convincing.

I think for this part, the main point is how from test, we can synchronize
each interleaving part of code.
Any ideas how this can be realized?

> Most bugs are likely to hang out in
> combinations we don't see in practice -- for instance having a tuple
> deleted and a new one inserted in the same slot in the time a
> different transaction was context switched out.

With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Atri Sharma 2013-03-27 09:08:40 Re: GSoC project : K-medoids clustering in Madlib
Previous Message Heikki Linnakangas 2013-03-27 08:12:06 Re: GSoC project : K-medoids clustering in Madlib