msdtc with 32-bit app fails to resolve in-doubt or not-notifed transactions

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: "pgsql-odbc(at)postgresql(dot)org" <pgsql-odbc(at)postgresql(dot)org>
Subject: msdtc with 32-bit app fails to resolve in-doubt or not-notifed transactions
Date: 2014-06-20 16:03:37
Message-ID: 53A45B59.70303@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-odbc

Hi folks

I've found an issue with psqlODBC's MSDTC support and pgxalib.dll, where
a 32-bit application on a 64-bit server will intermittently leave
transactions in the "only failed to notify" state in MSDTC.

This occurs when:

- The application exits normally after its final ITransaction::Commit
call returns but before MSDTC has invoked
ITransactionResourceAsync::CommitRequest on the psqlODBC-provided
IAsyncPG object; or

- When the application or server crash after MSDTC Phase I but before
Phase II.

In both these cases the resource manager is supposed to handle
transaction resolution. It uses pgxalib.dll for this as that's the
registered XA co-ordinator for the resource type.

I've been able to trace pgxalib.dll (which, btw, was painful, will
follow up on that) and found that XAConnection::xa_recover() is being
called on the transaction, as expected. It's calling into
XAConnection::ActivateConnection, where it fails to establish an ODBC
connection and bails out at the test at 142 after getting return code -1
from SQLDriverConnect(...).

http://msdn.microsoft.com/en-us/library/ms716219(v=vs.85).aspx

suggests that this is SQL_ERROR. pgxalib.dll doesn't call SQLGetDiagRec
or SQLGetDiagField to get any details and log them; I'll submit a
separate patch for that.

It took me a while to figure it out, but SQLDriverConnect is failing
because it's using the name of the 32-bit driver, since it got the DSN
from a 32-bit application. So there's no such driver as far as the
64-bit application is concerned.

(It didn't help that I couldn't enable system-wide ODBC tracing on the
system for unrelated and annoying as-yet-unresolved reasons with the
ODBC driver manager).

Anyway - it looks like it'll be necessary to figure out in pgxalib.dll
when this is happening and remap the driver name. That seems pretty
crude, though, so I'm looking for better ideas.

I'll follow up when it's not midnight with:

- a patch to add proper error diagnostics in pgxalib.dll on connection
failure;

- results of testing a hack that just mangles the dsn connection string
manually, as a proof of concept to show that this is really the issue; and

- If I can figure out how to do it the right way (as opposed to just
abusing a breakpoint to set the lvalue on return like I ended up doing),
some documentation on how to turn pgxalib tracing on.

As part of this I've been wondering whether it's possible to deal with
that exit race condition. I'm not sure how to tackle that - I don't
speak fluent COM or OLE. Do you think it'd be legal to delay in the
IAsyncPG dtor until either we confirm commit of an a tx we know is in
flight or we hit a (short) timeout?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Responses

Browse pgsql-odbc by date

  From Date Subject
Next Message Desenvolvimento 2014-06-20 17:24:40 Bug when performing command SELECT without cast
Previous Message Craig Ringer 2014-06-20 13:24:42 Re: Protocol de-synchronisation bug, bogus query sent