Race condition in WaitForBackgroundWorkerStartup

From: Jeremy Finzel <finzelj(at)gmail(dot)com>
To: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Race condition in WaitForBackgroundWorkerStartup
Date: 2018-11-12 18:25:07
Message-ID: CAMa1XUhAES394JA=m=n5GWnLro7JnetZXe0RZYJ3X5piQ8cKaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I believe I found a race condition in WaitForBackgroundWorkerStartup in the
case where it encounters an ERROR during startup. I found that depending
on the speed of the system, it will unreliably return either status
BGWH_STOPPED or BGWH_STARTED. But I can reliably reproduce getting
BGWH_STOPPED by tweaking the worker_spi.c test module.

On my own system running 11.1 (or any other version of pg actually), it
returns BGWH_STOPPED and thus a hard error message (ERROR: could not start
background process). But for other colleagues, it returns BGWH_STARTED and
thus the client sees the pid that was launched. One then will see an error
in the server logs only as the process exits.

Here is the relevant section of worker_spi.c:395-398:
if (!RegisterDynamicBackgroundWorker(&worker, &handle))
PG_RETURN_NULL();

status = WaitForBackgroundWorkerStartup(handle, &pid);

First, I hacked the SQL in the worker_spi_main module to be invalid. Then
I see one or the other behavior (pid result or ERROR) depending on user.

Then I added an arbitrary sleep before the WaitForBackgroundWorkerStartup
call, and reliably, it will always shows an ERROR message.

I'm not sure if this is substantial or not, but it's causing me a problem
where I am regression testing an invalid background worker launch and can't
trust a reliable output.

This was my original post:
https://www.postgresql.org/message-id/CAMa1XUhFap+AibpAHSkjRwN4cd9o8KYghWtG99JNofrEDzsAGw@mail.gmail.com

Now that I figured out the issue, and that it's unrelated to my extension,
I thought it warranted to start a separate thread. I am not sure how to
solve this issue best.

Thanks!
Jeremy

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-11-12 19:00:35 Re: Support custom socket directory in pg_upgrade
Previous Message Tom Lane 2018-11-12 18:23:32 Re: Libpq support to connect to standby server as priority