PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.

From: "Shiv Shivaraju Gowda (shivshi)" <shivshi(at)cisco(dot)com>
To: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.
Date: 2013-11-22 01:59:58
Message-ID: CEB3FA9D.6450%shivshi@cisco.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


There seems to be a has a dormant bug in PostgreSQL source code build using VisualStudio(VS) which shows up with the newer OS( not sure if it is the OS or some other thing in the environment ). MinGW build doesn't access this code path and thus doesn't hit this bug.

SYMPTOM:
The following symptom is encountered in PostgreSQL build using Visual Studio: PostgreSQL regression tests fail with server crashing repeatedly with this message in the log file: "PANIC: could not lock semaphore: error code 0". The issue is encountered for VS 2005, VS 2008( 32bit and 64bit executables), VS 2010 and VS 2012 built executables. The issue was reproducible with PostgreSQL 6.2.3 and 6.2.5. We didn't encounter this issue in MinGW build or EnterpriseDB Packaged executables (which seems to have been built using VisualStudio 2010).

CAUSE:
The PGSemaphoreLock function in postgresql-9.2.5\src\backend\port\win32_sema.c (https://github.com/postgres/postgres/blob/master/src/backend/port/win32_sema.c) uses "Ex" version of "WaitForMultipleObjects" Windows function (http://msdn.microsoft.com/en-us/library/windows/desktop/ms687028%28v=vs.85%29.aspx) however doesn't handle the additional awake calls from the "bAlertable" state. Specifically, it doesn't handle the WAIT_IO_COMPLETION return code when woken up by a User-mode Asynchronous Procedure Call(APC) or Async IO completion.

The part I do NOT understand is why do the User-Mode APC or Async IO completion triggers get fired only in the executables we built and not in the ones built by EnterpriseDB and bits built by other users since no one has complained about it. Irrespective if it is triggered or not, the code should have handled all the return codes of the WaitForMultipleObjectsEX API and that is the reason I think it is a bug.

I checked the source code for calls which will trigger the Async IO or user-mode APC(ReadFileEx<http://msdn.microsoft.com/en-us/library/windows/desktop/aa365468(v=vs.85).aspx>, WriteFileEx<http://msdn.microsoft.com/en-us/library/windows/desktop/aa365748(v=vs.85).aspx>, QueueUserAPC<http://msdn.microsoft.com/en-us/library/windows/desktop/ms684954(v=vs.85).aspx>) and could not find any. I am not sure what triggers the WAIT_IO_COMPLETION return call. I could not find a way to figure out that information in the debug environment.

(Posible) FIX:
Either of the following changes to the PGSemaphoreLock function work fine and pass the regression tests.

1) Replace the call to WaitForMultipleObjectsEx with WaitForMultipleObjects.
2) Handle the WAIT_IO_COMPLETION return code same as WAIT_OBJECT_0. There is a similar code like this in socket.c, so this change should be safe too.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message rajasekhar5c1 2013-11-22 07:28:28 BUG #8613: getting null when null is concatenated with string
Previous Message Patrick Lademan 2013-11-22 00:11:46 Re: Postgres jobs will not automatically login on Mac OSX