From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Streaming replication, some small issues |
Date: | 2009-12-08 11:38:31 |
Message-ID: | 3f0b79eb0912080338m71505de4g1aa61e6229fc1666@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Dec 8, 2009 at 5:30 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> A couple of small issues spotted while reviewing the streaming
> replication patch:
Thanks for the review!
> - Because sentPtr is initialized to zeros, GetOldestWALSendPointer will
> return zero before a just-launched WAL sender has sent its first
> message. That can lead to WAL files that are still needed by another
> standby to be deleted prematurely.
Oops! I fixed that (in my git repository, see the bottom of this mail).
> - If a WAL file is not found in the master for some reason, standby goes
> into an infinite loop retrying it:
>
> ERROR: could not read xlog records: FATAL: could not open file
> "pg_xlog/000000010000000000000000" (log file 0, segment 0): No such file
> or directory
http://archives.postgresql.org/pgsql-hackers/2009-09/msg01393.php
>> walreceiver shouldn't die on connection error, just to be restarted by
>> startup process. Can we add error handling a la bgwriter and have a
>> retry loop within walreceiver?
As the result of your current and previous comment, you mean that
walreceiver should always retry connecting to the primary after
a connection error occurs in PQgetXLogData/PQputXLogRecPtr, and
exit after the other errors occur? Though I'm not sure whether
we can determine the error type precisely.
> - It's possible to shut down master, change max_wal_senders to 0,
> restart and do an operation like CLUSTER which then skips WAL-logging.
> Then shutdown, change max_wal_senders back to non-zero. All this while
> the standby is running. Leads to a corrupt standby.
I've regarded this case as a restriction. But, how do you think
we should cope with it?
1. Restriction: only documentation is required?
2. Needs safe guard:
- forbid the primary to perform such operations while the
standby is running?
- emit PANIC error on the standby if the primary which lost sync
restarts?
3. Full solution: automatic resync mechanism is required?
> I've also pushed a couple of small cosmetic changes to replication
> branch at git://git.postgresql.org/git/users/heikki/postgres.git
Your changes seem good.
I pulled and merged your changes into my repository:
git://git.postgresql.org/git/users/fujii/postgres.git
branch: replication
And, I pushed the capability of replication of a backup history file
into the repository.
> I'll continue reviewing...
Thanks a lot!
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Stark | 2009-12-08 11:54:58 | Re: Streaming replication, some small issues |
Previous Message | Robert Haas | 2009-12-08 10:46:04 | Re: EXPLAIN BUFFERS |