Re: PostgreSQL XAResource & GlassFish 3.1.2.2

From: Bryan Varner <bvarner(at)polarislabs(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: PostgreSQL XAResource & GlassFish 3.1.2.2
Date: 2013-02-12 16:01:31
Message-ID: 511A675B.3020207@polarislabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

>> * Race conditions as multiple threads are participating in the same
>> transaction, invoking XAResource methods. Status checks in
>> PGXAConnection.java are throwing exceptions (if state == ACTIVE)
>> throw) by the time in invokes the throw, the state is != ACTIVE)
>> Before you start telling me I shouldn't be using threads in a JEE
>> environment let me remind you that EJBs by default are served out of
>> thread pools, and allow for concurrent threads to participate within a
>> single TX scope. This is outlined as part of the transaction context
>> in the JTS spec (section 2.2 and 2.3) and synchronized thread-safe
>> access to XAResources is described (without being explicitly called
>> out) by the JTA 1.0.1 spec.
>
> We could fairly easily just add "synchronized" to all the public
> methods. I wonder how sane it is for Glassfish to be doing that in the
> first place, though. AFAICS, in any combination of two XAResource
> methods, called concurrently, one of the threads will get an error
> anyway. For example, if two threads try to call start() at the same
> time, one of them has to fail because an XAResource can only be
> associated with a one transaction at a time.

I think there's some confusion between a thread and a logical
transaction (represented by a physical connection to the db), with an
XID managed by a Transaction Manager.

In an JEE container, it's expected that multiple threads will do work on
behalf on a single XAResource, managed by the transaction manager. A
single XID (XAResource) will have multiple threads doing work on their
behalf. This does not necessitate interleaving, but it does mean that
multiple threads can be invoking start() and end() on an XAResource.

>> * It appears that a second thread attempting to join an existing
>> XAResource's scope with start(XID, TMJOIN) is going to be refused,
>> even if it's attempting to participate in the same XID. The exception
>> thrown is one complaining about interleaving, even though it's the
>> -same- XID, not a sub-set of work in another TX.
>
> Hmm, so the application server is trying to do something like this:
>
> xares.start(1234, 0);
> xares.start(1234, TMJOIN);
>
> We could easily allow that in the driver (ie. do nothing), but that
> doesn't seem like valid sequence of calls to begin with. You must
> disassociate the XAResource from the current transaction with end(),
> before re-associating it with start().

You're correct, after doing a bunch more reading, the code path above is
invalid.

What should be valid (and is not considered interleaving), is:

Thread A Thread B
-------- ---------
xares.start(1234, TMNOFLAGS);
doStuff();
xares.end(1234);
xares.start(1234, TMJOIN);
doStuff();
xares.end(1234);
xares.start(1234, TMJOIN);
doStuff();
xares.end(1234);

So long as the TM is serializing execution of A and B and not allowing
branch interleaving.

In this case, the XAResource is preforming work on behalf of more than
one thread, but in the same XID context. The problem I think I'm seeing
at this point (still trying to coordinate with the glassfish devs) is
that the XAResource isn't fully completing execution of end() prior to
the other thread invoking start(), even though the method invocation
appears to be happening 'in order'. This would manifest as a classic
race condition, and would not constitute transaction interleaving, since
the XID in use is the same TX branch.

I'm working on a test case as part of the XAResource test suite in the
driver codebase, as I'm doing this, I'm trying to nail down how
glassfish is synchronizing access to XAResources, so this is taking me
some time.

What I can tell you, is that I'm seeing exception cases in my prod
environment where the currentXid.equals(xid), but where the state field
in XAConnection hasn't been updated by a concurrent calls to
start()/end() in time to pass the interleaving pre-condition checks. My
current hypothesis is that GF isn't trying to do interleaving, but the
internal state field isn't being updated 'fast enough' (thread-safely)
to avoid race conditions in non-interleaved, but multi-threaded
environments.

Regards,
-Bryan

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Florent Guillaume 2013-02-12 18:00:12 Re: PostgreSQL XAResource & GlassFish 3.1.2.2
Previous Message Tom Lane 2013-02-12 15:50:44 Re: [HACKERS] JPA + enum == Exception