From: | "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com> |
---|---|
To: | 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Re: [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly |
Date: | 2016-11-22 03:18:50 |
Message-ID: | 0A3221C70F24FB45833433255569204D1F656653@G01JPEXMBYT05 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Amit Kapila
> I have tried using attached script multiple times on latest 9.2 code, but
> couldn't reproduce the issue. Please find the log attached with this mail.
> Apart from log file, below prints appear:
>
> WARNING: enabling "trust" authentication for local connections You can
> change this by editing pg_hba.conf or using the option -A, or --auth-local
> and --auth-host, the next time you run initdb.
> 20075/20075 kB (100%), 1/1 tablespace
> NOTICE: pg_stop_backup complete, all required WAL segments have been
> archived
> 20079/20079 kB (100%), 1/1 tablespace
>
> Let me know, if some parameters need to be tweaked to reproduce the issue?
>
>
> It seems that the patch proposed is good, but it is better if somebody other
> than you can reproduce the issue and verify if the patch fixes the same.
>
Thank you for reviewing the code and testing. Hmm, we could reproduce the problem on PostgreSQL 9.2.19. The script's stdout is attached as test.log, and the stderr is as follows:
WARNING: enabling "trust" authentication for local connections You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.
20099/20099 kB (100%), 1/1 tablespace
NOTICE: pg_stop_backup complete, all required WAL segments have been archived
20103/20103 kB (100%), 1/1 tablespace
The sizes pg_basebackup outputs is a bit different from yours. I don't see a reason for this. The test script explicitly specifies the database encoding and locale, so the encoding difference doesn't seem to be the cause. The target problem occurs only when a WAL record crosses a WAL segment boundary, so subtle change in WAL record volume would prevent the problem from happening.
Anyway, could you retry with the attached test.sh? It just changes restore_command.
If the problem occurs, the following pair of lines appear in the server log of the cascading standby. Could you check it?
LOG: restored log file "000000020000000000000003" from archive
LOG: out-of-sequence timeline ID 1 (after 2) in log file 0, segment 3, offset 0
Regards
Takayuki Tsunakawa
Attachment | Content-Type | Size |
---|---|---|
test.sh | application/octet-stream | 3.3 KB |
test.log | application/octet-stream | 1.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro HORIGUCHI | 2016-11-22 03:35:21 | Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender? |
Previous Message | Amit Kapila | 2016-11-22 02:54:27 | Re: [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan |