Re: Problem in SQLFreeHandle (Statement)

From: Jade Koskela <jkoskela0(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, pgsql-odbc(at)postgresql(dot)org
Subject: Re: Problem in SQLFreeHandle (Statement)
Date: 2014-06-19 22:26:02
Message-ID: CAN5ZvqwMXGdmz+pYLSuaAVNgBZ1og7aBG0imKNMCfT8NtD0OJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-odbc

Thanks Heikki !

This has solved the problem. I'm still curious as to the root cause.
According to the SUSv3, which OSX/Darwin is supposed to conform to,
recursive mutexes should return error codes if unlocked from a different
thread, but it is not undefined behavior (it is only undefined for normal
mutexes). I tried to test this by doing a bunch of locking/unlocking of the
same mutex from different threads and couldn't see any problems. However,
if I compile without support for recursive mutexes then the problem
disappears. Maybe we'll never know.

Anyways back to other work, thanks again.
Jade Koskela
Tableau Software

On Thu, Jun 19, 2014 at 5:33 AM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote:

> On 06/17/2014 11:06 PM, Jade Koskela wrote:
>
>> I couldn't reproduce the behavior using the test program, or a similar
>> one.
>> I recompiled the driver to enable extra logging.
>> Here is the grep on ENTER_CONN_CS, the second address is the address of
>> the
>> mutex (pthread recursive).
>> This was compiled on OSX with Clang, and then also with GCC with the same
>> result. Behavior doesn't reproduce on Windows.
>>
>>
>> [140735257568016][[SQLFreeHandle]]
>>
>> [140735257568016]*** ENTER_CONN_CS [140210608634616] ***
>>
>> [140735257568016]*** ENTER_CONN_CS [140210608634616] ***
>>
>> [140735257568016]*** LEAVE_CONN_CS [140210608634616] ***
>>
>> [140735257568016]*** LEAVE_CONN_CS [140210608634616] ***
>>
>> [140735257568016]*** LEAVE_CONN_CS [140210608634616] ***
>>
>> [140735257568016][SQLGetConnectAttrW]
>>
>> [140735257568016]*** ENTER_CONN_CS [140210608634616] ***
>>
>> [140735257568016]*** LEAVE_CONN_CS [140210608634616] ***
>>
>> [140735257568016][SQLGetConnectAttrW]
>>
>> [140735257568016]*** ENTER_CONN_CS [140210608634616] ***
>>
>> [140735257568016]*** LEAVE_CONN_CS [140210608634616] ***
>>
>> [140735257568016][SQLGetConnectAttrW]
>>
>> [140735257568016]*** ENTER_CONN_CS [140210608634616] ***
>>
>> [140735257568016]*** LEAVE_CONN_CS [140210608634616] ***
>>
>> [4466282496][[SQLAllocHandle]]
>>
>> [4466282496]*** ENTER_CONN_CS [140210608634616] *** <- new thread
>> grabbed mutex, count=1
>>
>> [4466282496]**** PGAPI_AllocStmt: hdbc = 0x7f8553857000, stmt =
>> 0x7f8552d17100
>>
>> [4466282496]*** LEAVE_CONN_CS [140210608634616] *** <- count=0
>>
>> [4466282496]**** PGAPI_Prepare: STMT_ALLOCATED, copy
>>
>> [4466282496]*** ENTER_CONN_CS [140210608634616] *** <- count=1
>>
>> [4466282496]*** ENTER_CONN_CS [140210608634616] *** <- count=2
>>
>> [140735257568016]*** LEAVE_CONN_CS [140210608634616] *** <- this thread
>> doesn't own, count=2
>>
>
> A-ha. The problem is that after issuing the query cancel, the SQLCancel()
> function calls DiscardStatementSvp(), to discard any savepoints associated
> with the statement. SetStatementSvp() and DiscardStatementSvp() maintain a
> counter of how many times the CONN_CS has been acquired, and
> DiscardStatementSvp() tries to completely release it by releasing it as
> many times as the counter says it's been locked. However, unlocking a
> pthread mutex from a different thread than the one that locked it is
> undefined behavior. Apparently it happens to work on other platforms by
> accident, but OS X is somehow more picky about it.
>
> We never actually check the return value of pthread_mutex_lock() /
> pthread_mutex_unlock(), so it's possible that they are reporting errors
> even on other platforms, and we just don't pay attention to it. That ought
> to be fixed too, at least in debug builds. Also for debugging purposes, I
> think we should add a thread ID field next to the lock_CC_for_rb field to
> remember which thread is holding the locks, as a sanity check.
>
> Attached are two patches. The first one is a narrow fix, which should fix
> the bug. It simply removes the DiscardStatementSvp() call. AFAICS it's
> never needed: if the statement was busy executing in a different thread
> (this is the case you hit), we send a query cancel request to the server,
> and it's not cool to do anything else with the statement. And in the
> codepaths where the statement was not busy executing, PGAPI_Cancel()
> already called DiscardStatementSvp(), with the appropriate lock held.
>
> The second patch refactors PGAPI_Cancel() and adds comments, to make it
> more readable. It's not absolutely required, but I found PGAPI_Cancel()
> very difficult to debug as it is.
>
> Can you test these patches, to confirm that it fixes the bug you're seeing?
>
> - Heikki
>
>

In response to

Browse pgsql-odbc by date

  From Date Subject
Next Message Craig Ringer 2014-06-20 00:27:50 Re: Removing support for v1 and v2 protocols?
Previous Message Barry Bell 2014-06-19 15:14:08 Out of memory while reading tuples error