From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Cc: | jhedden(at)apple(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: BUG #9118: WAL Sender does not disconnect replication clients during shutdown |
Date: | 2014-03-13 18:59:45 |
Message-ID: | CAHGQGwFyPDd_7hZFNKOiHUTY9LFK1GhJKOqUAuRLY2j+CPHg+w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Thu, Mar 13, 2014 at 7:52 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Sorry for the delay...
>
> On Thu, Feb 6, 2014 at 5:05 PM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
>> On 02/06/2014 05:08 AM, jhedden(at)apple(dot)com wrote:
>>>
>>> The following bug has been logged on the website:
>>>
>>> Bug reference: 9118
>>> Logged by: Joel Hedden
>>> Email address: jhedden(at)apple(dot)com
>>> PostgreSQL version: 9.3.2
>>> Operating system: Mac OS X 10.9.1
>>> Description:
>>>
>>> I connect a pg_receivexlog instance and have "hot_standby" archiving
>>> enabled, with "archive_command" defined correctly. When the WAL Sender
>>> process receives a SIGUSR2 from the postmaster (or me), it fails to shut
>>> down and pg_receivexlog remains connected. Upon inspection, it looks like
>>> the test for "sentPtr == MyWalSnd->flush" is always false at
>>> walsender.c:1058 (sentPtr is still non-zero) where the wal sender should
>>> be
>>> shutting down. Replication and archiving seem to be working otherwise.
>>> Killing pg_receivexlog allows for the WAL Sender to terminate.
>>
>>
>> Hmm. Before exiting, walsender waits until the client has flushed all the
>> WAL to disk. However, pg_receivexlog never sends a "flush" pointer back to
>> the server, so the server waits forever.
>>
>> The first question is, why does pg_receivexlog not send its "flush" pointer
>> back to the server? It *does* fsync the files to disk. However, currently it
>> only fsyncs when closing a full segment, but when shutting down, the last
>> segment would not be full, so to fix this issue it should be taught to fsync
>> also partial segments.
>
> Yes. And, pg_receivexlog returns InvalidXLogRecPtr as the flush location,
> so "sentPtr == MyWalSnd->flush" will never be true when using pg_receivexlog...
> The quick-fix seems not to wait for that condition to be true whenever the flush
> location is invalid.
On second thought, I think that it's better to check a write location instead
if walsender is connecting to a standby such as pg_receivexlog which
always returns an invalid flush location. Attached patch does this. Thought?
Regards,
--
Fujii Masao
Attachment | Content-Type | Size |
---|---|---|
fix_shutdown_and_receivexlog_problem_v1.patch | text/x-diff | 987 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2014-03-13 19:19:11 | Re: BUG #9551: Hang in State "authentication" Prevents Vacuum from Freeing Dead Rows |
Previous Message | Grégory Giannoni | 2014-03-13 17:05:38 | Re: LIMIT causes huge slow down |