From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | src/test/subscription/t/005_encoding.pl is broken |
Date: | 2017-09-18 15:50:06 |
Message-ID: | 27032.1505749806@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
To reproduce the subscription-startup hang that Thomas Munro observed,
I changed src/backend/replication/logical/launcher.c like this:
@@ -427,7 +427,8 @@ retry:
bgw.bgw_notify_pid = MyProcPid;
bgw.bgw_main_arg = Int32GetDatum(slot);
- if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+ if (random() < 1000000000 ||
+ !RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
{
/* Failed to start worker, so clean up the worker slot. */
LWLockAcquire(LogicalRepWorkerLock, LW_EXCLUSIVE);
This causes about 50% of worker launch requests to fail.
With the fix I just committed, 002_types.pl gets through fine,
but 005_encoding.pl does not; it sometimes fails like this:
t/005_encoding.pl ..... 1/1
# Failed test 'data replicated to subscriber'
# at t/005_encoding.pl line 49.
# got: ''
# expected: '1'
# Looks like you failed 1 test of 1.
t/005_encoding.pl ..... Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/1 subtests
The reason seems to be that its method of waiting for replication
to happen is completely inapropos. It's watching for the master
to say that the slave has received all the WAL, but that does not
ensure that the logicalrep apply workers have caught up, does it?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2017-09-18 16:12:12 | Re: Reporting query on crash even if completed |
Previous Message | Joshua D. Drake | 2017-09-18 15:45:52 | Re: Is it time to kill support for very old servers? |