From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
---|---|
To: | Magnus Hagander <magnus(at)hagander(dot)net> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: several problems in pg_receivexlog |
Date: | 2012-07-12 16:07:13 |
Message-ID: | CAHGQGwHei6Y92YaL=gVmXiMPTYY+5xu177g6xjOUTrne4uSSjQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jul 12, 2012 at 8:39 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Tue, Jul 10, 2012 at 7:03 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Tue, Jul 10, 2012 at 3:23 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> Hi,
>>>
>>> I found several problems in pg_receivexlog, e.g., memory leaks,
>>> file-descripter leaks, ..etc. The attached patch fixes these problems.
>>>
>>> ISTM there are still some other problems in pg_receivexlog, so I'll
>>> read it deeply later.
>>
>> While pg_basebackup background process is streaming WAL records,
>> if its replication connection is terminated (e.g., walsender in the server
>> is accidentally terminated by SIGTERM signal), pg_basebackup ends
>> up failing to include all required WAL files in the backup. The problem
>> is that, in this case, pg_basebackup doesn't emit any error message at all.
>> So an user might misunderstand that a base backup has been successfully
>> taken even though it doesn't include all required WAL files.
>
> Ouch. That is definitely a bug if it behaves that way.
>
>
>> To fix this problem, I think that, when the replication connection is
>> terminated, ReceiveXlogStream() should check whether we've already
>> reached the stop point by calling stream_stop() before returning TRUE.
>> If we've not yet (this means that we've not received all required WAL
>> files yet), ReceiveXlogStream() should return FALSE and
>> pg_basebackup should emit an error message. Comments?
>
> Doesn't it already return false because it detects the error of the
> connection? What's the codepath where we end up returning true even
> though we had a connection failure? Shouldn't that end up under the
> "could not read copy data" branch, which already returns false?
You're right. If the error is detected, that function always returns false
and the error message is emitted (but I think that current error message
"pg_basebackup: child process exited with error 1" is confusing....),
so it's OK. But if walsender in the server is terminated by SIGTERM,
no error is detected and pg_basebackup background process gets out
of the loop in ReceiveXlogStream() and returns true.
Regards,
--
Fujii Masao
From | Date | Subject | |
---|---|---|---|
Next Message | Magnus Hagander | 2012-07-12 16:15:07 | Re: several problems in pg_receivexlog |
Previous Message | Andrew Dunstan | 2012-07-12 14:17:46 | Re: Schema version management |