Re: Strange replication problem - segment restored from archive but still requested from master

From: Piotr Gasidło <quaker(at)barbara(dot)eu(dot)org>
To: Venkata Balaji N <nag1010(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Strange replication problem - segment restored from archive but still requested from master
Date: 2015-05-22 08:08:35
Message-ID: CAF8akQuesk5GZ37QNE=Up_WF8SG=ybard0H_yL98VGPwGKRExA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

2015-05-22 2:16 GMT+02:00 Venkata Balaji N <nag1010(at)gmail(dot)com>:
>
> It might be yelling about the WAL segment due to the delay in shipping it
> from master to slave.
> Do you have the restore_command set up in the recovery.conf file ? do you
> have any automated job which is shipping WAL archives from master to slave.

Both yes. I've used omnipitr. Heavy logged. So this is on master:

postgresql.conf:

archive_mode = on
archive_command =
'/usr/local/pgsql/scripts/omnipitr/bin/omnipitr-archive --config-file
/usr/local/pgsql/core/scripts/omnipitr/config/archive "%p"'

/usr/local/pgsql/core/scripts/omnipitr/config/archive:

--verbose
--data-dir=/usr/local/pgsql/core/data/
--dst-remote=pgsql(at)d0(dot)XXX:/usr/local/pgsql/core/wal/
--dst-remote=pgsql(at)d8(dot)XXX:/usr/local/pgsql/core/wal/
--state-dir=/usr/local/pgsql/core/scripts/omnipitr/state
--log=/usr/local/pgsql/core/scripts/omnipitr/log/omnipitr.log

/usr/local/pgsql/core/scripts/omnipitr/log/omnipitr.log, part:

2015-05-21 20:35:32.404395 +0200 : 98206 : omnipitr-archive : LOG :
Segment /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D0000008E
successfully sent to all destinations.
2015-05-21 20:35:59.993614 +0200 : 98214 : omnipitr-archive : LOG :
Called with parameters: --config-file
/usr/local/pgsql/core/scripts/omnipitr/config/archive
pg_xlog/0000000400004C4D0000008F
2015-05-21 20:36:00.495839 +0200 : 98214 : omnipitr-archive : LOG :
Sending /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D0000008F to
pgsql(at)d0(dot)XXX:/usr/local/pgsql/core/wal/0000000400004C4D0000008F ended
in 0.500619s
2015-05-21 20:36:00.896906 +0200 : 98214 : omnipitr-archive : LOG :
Sending /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D0000008F to
pgsql(at)d8(dot)XXX:/usr/local/pgsql/core/wal/0000000400004C4D0000008F ended
in 0.396684s
2015-05-21 20:36:00.897467 +0200 : 98214 : omnipitr-archive : LOG :
Timer [Segment delivery] took: 0.903s
2015-05-21 20:36:00.901777 +0200 : 98214 : omnipitr-archive : LOG :
Segment /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D0000008F
successfully sent to all destinations.
2015-05-21 20:36:33.916879 +0200 : 98222 : omnipitr-archive : LOG :
Called with parameters: --config-file
/usr/local/pgsql/core/scripts/omnipitr/config/archive
pg_xlog/0000000400004C4D00000090
2015-05-21 20:36:34.379965 +0200 : 98222 : omnipitr-archive : LOG :
Sending /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000090 to
pgsql(at)d0(dot)XXX:/usr/local/pgsql/core/wal/0000000400004C4D00000090 ended
in 0.458563s
2015-05-21 20:36:34.931918 +0200 : 98222 : omnipitr-archive : LOG :
Sending /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000090 to
pgsql(at)d8(dot)XXX:/usr/local/pgsql/core/wal/0000000400004C4D00000090 ended
in 0.550890s
2015-05-21 20:36:34.935241 +0200 : 98222 : omnipitr-archive : LOG :
Timer [Segment delivery] took: 1.015s
2015-05-21 20:36:34.935776 +0200 : 98222 : omnipitr-archive : LOG :
Segment /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000090
successfully sent to all destinations.
2015-05-21 21:12:38.242687 +0200 : 561 : omnipitr-archive : LOG :
Called with parameters: --config-file
/usr/local/pgsql/core/scripts/omnipitr/config/archive
pg_xlog/0000000400004C4D00000090
2015-05-21 21:12:38.425904 +0200 : 561 : omnipitr-archive : LOG :
Sending /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000090 to
pgsql(at)d0(dot)XXX:/usr/local/pgsql/core/wal/0000000400004C4D00000090 ended
in 0.181528s
2015-05-21 21:12:38.969018 +0200 : 561 : omnipitr-archive : LOG :
Sending /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000090 to
pgsql(at)d8(dot)XXX:/usr/local/pgsql/core/wal/0000000400004C4D00000090 ended
in 0.539258s
2015-05-21 21:12:38.971513 +0200 : 561 : omnipitr-archive : LOG :
Timer [Segment delivery] took: 0.728s
2015-05-21 21:12:38.973229 +0200 : 561 : omnipitr-archive : LOG :
Segment /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000090
successfully sent to all destinations.
2015-05-21 21:28:51.389622 +0200 : 1634 : omnipitr-archive : LOG :
Called with parameters: --config-file
/usr/local/pgsql/core/scripts/omnipitr/config/archive
pg_xlog/0000000400004C4D00000091
2015-05-21 21:28:52.457545 +0200 : 1634 : omnipitr-archive : LOG :
Sending /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000091 to
pgsql(at)d0(dot)XXX:/usr/local/pgsql/core/wal/0000000400004C4D00000091 ended
in 1.065172s
2015-05-21 21:28:52.908541 +0200 : 1634 : omnipitr-archive : LOG :
Sending /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000091 to
pgsql(at)d8(dot)XXX:/usr/local/pgsql/core/wal/0000000400004C4D00000091 ended
in 0.449934s
2015-05-21 21:28:52.910337 +0200 : 1634 : omnipitr-archive : LOG :
Timer [Segment delivery] took: 1.519s
2015-05-21 21:28:52.910902 +0200 : 1634 : omnipitr-archive : LOG :
Segment /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000091
successfully sent to all destinations.
2015-05-21 21:28:53.028408 +0200 : 1639 : omnipitr-archive : LOG :
Called with parameters: --config-file
/usr/local/pgsql/core/scripts/omnipitr/config/archive
pg_xlog/0000000400004C4D00000092
2015-05-21 21:28:53.783568 +0200 : 1639 : omnipitr-archive : LOG :
Sending /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000092 to
pgsql(at)d0(dot)XXX:/usr/local/pgsql/core/wal/0000000400004C4D00000092 ended
in 0.753511s
2015-05-21 21:28:54.260492 +0200 : 1639 : omnipitr-archive : LOG :
Sending /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000092 to
pgsql(at)d8(dot)XXX:/usr/local/pgsql/core/wal/0000000400004C4D00000092 ended
in 0.473037s
2015-05-21 21:28:54.261054 +0200 : 1639 : omnipitr-archive : LOG :
Timer [Segment delivery] took: 1.232s
2015-05-21 21:28:54.262764 +0200 : 1639 : omnipitr-archive : LOG :
Segment /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000092
successfully sent to all destinations.
2015-05-21 21:30:06.034584 +0200 : 1870 : omnipitr-archive : LOG :
Called with parameters: --config-file
/usr/local/pgsql/core/scripts/omnipitr/config/archive
pg_xlog/0000000400004C4D00000093
2015-05-21 21:30:06.920507 +0200 : 1870 : omnipitr-archive : LOG :
Sending /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000093 to
pgsql(at)d0(dot)XXX:/usr/local/pgsql/core/wal/0000000400004C4D00000093 ended
in 0.883097s
2015-05-21 21:30:07.391485 +0200 : 1870 : omnipitr-archive : LOG :
Sending /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000093 to
pgsql(at)d8(dot)XXX:/usr/local/pgsql/core/wal/0000000400004C4D00000093 ended
in 0.469941s
2015-05-21 21:30:07.394826 +0200 : 1870 : omnipitr-archive : LOG :
Timer [Segment delivery] took: 1.358s

But now see some unexpected problem. Shipping for
0000000400004C4D00000090 was runned twice. But probably it is caused
by me - I've tried to put file again in master pg_xlog from d0.XXX and
run CHECKPOINT on master. But this didn't worked out.

I have WAL files:

d0:/usr/local/pgsql/core/wal # ls -laT 0000000400004C4D00000090
0000000400004C4D00000091 0000000400004C4D00000092
0000000400004C4D00000093
-rw------- 1 pgsql pgsql 16777216 May 21 20:36:33 2015
0000000400004C4D00000090
-rw------- 1 pgsql pgsql 16777216 May 21 21:28:51 2015
0000000400004C4D00000091
-rw------- 1 pgsql pgsql 16777216 May 21 21:28:52 2015
0000000400004C4D00000092
-rw------- 1 pgsql pgsql 16777216 May 21 21:30:05 2015
0000000400004C4D00000093

> Slave - if it does not find the WAL segment in master pg_xlog location, it
> looks for the same in the location mentioned in the restore_command. It
> yells if it is unable to find it.
> Cannot give any concrete comments/suggestions, until recovery.conf entries
> are known.

On slave, recovery.conf:

In the time of writing it I switched from OmniPITR:

restore_command =
'/usr/local/pgsql/scripts/omnipitr/bin/omnipitr-restore --config-file
/usr/local/pgsql/core/scripts/omnipitr/config/restore -sr "%f" "%p"'

To:

restore_command = 'cp /usr/local/pgsql/core/wal/%f %p'

And manualy rsynced files from d0.XXX to /usr/local/pgsql/core/wal/ directory.

>> What was wrong?
>
>
> This error message can be ignored, did you try shipping
> 0000000400004C4D00000093 to slave from master's archive location and see if
> slave picked it up and applied ?

As I wrote, I've only shipped up to 0000000400004C4D00000092. File
0000000400004C4D00000093 was archived after slave managed to start and
successfuly start streaming WAL.

But I've done this. When I've noticed that slave is OK - I stopped it,
changed back to OmniPITR restore_command, and started again. So full
log:

-- here where slave sucessfuly start for the first time:
May 21 21:28:55 d8 postgres[53574]: [2-1] 2015-05-21 21:28:55.123 CEST
@ 53574 LOG: ending log output to stderr
May 21 21:28:55 d8 postgres[53574]: [2-2] 2015-05-21 21:28:55.123 CEST
@ 53574 HINT: Future log output will go to log destination
"syslog".
May 21 21:28:55 d8 postgres[53575]: [3-1] 2015-05-21 21:28:55.124 CEST
@ 53575 LOG: database system was shut down in recovery at
2015-05-21 21:24:56 CEST
May 21 21:28:55 d8 postgres[53575]: [4-1] 2015-05-21 21:28:55.126 CEST
@ 53575 LOG: entering standby mode
May 21 21:28:55 d8 postgres[53575]: [5-1] 2015-05-21 21:28:55.150 CEST
@ 53575 LOG: restored log file "0000000400004C4D00000088" from
archive
May 21 21:28:55 d8 postgres[53575]: [6-1] 2015-05-21 21:28:55.209 CEST
@ 53575 LOG: redo starts at 4C4D/88493B50
May 21 21:28:55 d8 postgres[53575]: [7-1] 2015-05-21 21:28:55.314 CEST
@ 53575 LOG: restored log file "0000000400004C4D00000089" from
archive
May 21 21:28:55 d8 postgres[53575]: [8-1] 2015-05-21 21:28:55.512 CEST
@ 53575 LOG: restored log file "0000000400004C4D0000008A" from
archive
May 21 21:28:55 d8 postgres[53585]: [3-1] 2015-05-21 21:28:55.781 CEST
pgsql(at)postgres 53585 [local] [unknown] FATAL: the database system is
starting up
May 21 21:28:55 d8 postgres[53575]: [9-1] 2015-05-21 21:28:55.831 CEST
@ 53575 LOG: restored log file "0000000400004C4D0000008B" from
archive
May 21 21:28:56 d8 postgres[53575]: [10-1] 2015-05-21 21:28:56.508
CEST @ 53575 LOG: restored log file "0000000400004C4D0000008C" from
archive
May 21 21:28:56 d8 postgres[53594]: [3-1] 2015-05-21 21:28:56.782 CEST
pgsql(at)postgres 53594 [local] [unknown] FATAL: the database system is
starting up
May 21 21:28:56 d8 postgres[53575]: [11-1] 2015-05-21 21:28:56.818
CEST @ 53575 LOG: restored log file "0000000400004C4D0000008D" from
archive
May 21 21:28:56 d8 postgres[53575]: [12-1] 2015-05-21 21:28:56.982
CEST @ 53575 LOG: restored log file "0000000400004C4D0000008E" from
archive
May 21 21:28:57 d8 postgres[53575]: [13-1] 2015-05-21 21:28:57.415
CEST @ 53575 LOG: restored log file "0000000400004C4D0000008F" from
archive
May 21 21:28:57 d8 postgres[53607]: [3-1] 2015-05-21 21:28:57.784 CEST
pgsql(at)postgres 53607 [local] [unknown] FATAL: the database system is
starting up
May 21 21:28:57 d8 postgres[53575]: [14-1] 2015-05-21 21:28:57.790
CEST @ 53575 LOG: restored log file "0000000400004C4D00000090" from
archive
May 21 21:28:58 d8 postgres[53575]: [15-1] 2015-05-21 21:28:58.328
CEST @ 53575 LOG: consistent recovery state reached at
4C4D/90FFF9C8
May 21 21:28:58 d8 postgres[53574]: [3-1] 2015-05-21 21:28:58.328 CEST
@ 53574 LOG: database system is ready to accept read only
connections
May 21 21:28:58 d8 postgres[53575]: [16-1] 2015-05-21 21:28:58.349
CEST @ 53575 LOG: restored log file "0000000400004C4D00000091" from
archive
May 21 21:28:58 d8 postgres[53618]: [4-1] 2015-05-21 21:28:58.786 CEST
pgsql(at)postgres 53618 [local] [unknown] FATAL: role "pgsql" does not
exist
May 21 21:28:59 d8 postgres[53575]: [17-1] 2015-05-21 21:28:59.962
CEST @ 53575 LOG: restored log file "0000000400004C4D00000092" from
archive
May 21 21:29:00 d8 postgres[53575]: [18-1] 2015-05-21 21:29:00.037
CEST @ 53575 LOG: unexpected pageaddr 4C45/DC000000 in log segment
0000000400004C4D00000093, offset 0
May 21 21:29:00 d8 postgres[53640]: [4-1] 2015-05-21 21:29:00.047 CEST
@ 53640 LOG: started streaming WAL from primary at 4C4D/92000000 on
timeline 4

-- I stopped him.
May 21 21:30:13 d8 postgres[53574]: [4-1] 2015-05-21 21:30:13.417 CEST
@ 53574 LOG: received fast shutdown request
May 21 21:30:13 d8 postgres[53574]: [5-1] 2015-05-21 21:30:13.418 CEST
@ 53574 LOG: aborting any active transactions
May 21 21:30:13 d8 postgres[53640]: [5-1] 2015-05-21 21:30:13.441 CEST
@ 53640 FATAL: terminating walreceiver process due to administrator
command
May 21 21:30:13 d8 postgres[53579]: [3-1] 2015-05-21 21:30:13.443 CEST
@ 53579 LOG: shutting down
May 21 21:30:13 d8 postgres[53579]: [4-1] 2015-05-21 21:30:13.443 CEST
@ 53579 LOG: restartpoint starting: shutdown immediate
May 21 21:30:15 d8 postgres[53579]: [5-1] 2015-05-21 21:30:15.094 CEST
@ 53579 LOG: restartpoint complete: wrote 25212 buffers (0.8%); 0
transaction log file(s) added, 0 removed, 4 recycled; write=0.946 s,
sync=0.629 s, total=1.651 s; sync files=66, longest=0.245 s, aver
May 21 21:30:15 d8 postgres[53579]: [6-1] 2015-05-21 21:30:15.094 CEST
@ 53579 LOG: recovery restart point at 4C4D/91A59348
May 21 21:30:15 d8 postgres[53579]: [6-2] 2015-05-21 21:30:15.094 CEST
@ 53579 DETAIL: last completed transaction was at log time
2015-05-21 21:30:13.377052+02
May 21 21:30:15 d8 postgres[53579]: [7-1] 2015-05-21 21:30:15.390 CEST
@ 53579 LOG: database system is shut down

-- I've changed restore_command, master managed to ship
0000000400004C4D00000093 in background.
-- Starting up.
May 21 21:30:25 d8 postgres[54546]: [1-1] 2015-05-21 21:30:25.983 CEST
@ 54546 LOG: loaded library "pg_stat_statements"
May 21 21:30:26 d8 postgres[54546]: [2-1] 2015-05-21 21:30:26.329 CEST
@ 54546 LOG: ending log output to stderr
May 21 21:30:26 d8 postgres[54546]: [2-2] 2015-05-21 21:30:26.329 CEST
@ 54546 HINT: Future log output will go to log destination
"syslog".
May 21 21:30:26 d8 postgres[54547]: [3-1] 2015-05-21 21:30:26.331 CEST
@ 54547 LOG: database system was shut down in recovery at
2015-05-21 21:30:15 CEST
May 21 21:30:26 d8 postgres[54547]: [4-1] 2015-05-21 21:30:26.626 CEST
@ 54547 LOG: entering standby mode
May 21 21:30:27 d8 postgres[54563]: [3-1] 2015-05-21 21:30:27.026 CEST
pgsql(at)postgres 54563 [local] [unknown] FATAL: the database system is
starting up
May 21 21:30:27 d8 postgres[54547]: [5-1] 2015-05-21 21:30:27.068 CEST
@ 54547 LOG: restored log file "0000000400004C4D00000091" from
archive
May 21 21:30:27 d8 postgres[54547]: [6-1] 2015-05-21 21:30:27.129 CEST
@ 54547 LOG: redo starts at 4C4D/91A59348
May 21 21:30:27 d8 postgres[54547]: [7-1] 2015-05-21 21:30:27.301 CEST
@ 54547 LOG: restored log file "0000000400004C4D00000092" from
archive
May 21 21:30:27 d8 postgres[54547]: [8-1] 2015-05-21 21:30:27.532 CEST
@ 54547 LOG: restored log file "0000000400004C4D00000093" from
archive
May 21 21:30:27 d8 postgres[54547]: [9-1] 2015-05-21 21:30:27.957 CEST
@ 54547 LOG: consistent recovery state reached at 4C4D/946CE620
May 21 21:30:27 d8 postgres[54547]: [10-1] 2015-05-21 21:30:27.957
CEST @ 54547 LOG: unexpected pageaddr 4C45/CD6D4000 in log segment
0000000400004C4D00000094, offset 7159808
May 21 21:30:27 d8 postgres[54546]: [3-1] 2015-05-21 21:30:27.957 CEST
@ 54546 LOG: database system is ready to accept read only
connections
May 21 21:30:27 d8 postgres[54580]: [4-1] 2015-05-21 21:30:27.969 CEST
@ 54580 LOG: started streaming WAL from primary at 4C4D/94000000 on
timeline 4
May 21 21:30:28 d8 postgres[54581]: [4-1] 2015-05-21 21:30:28.039 CEST
pgsql(at)postgres 54581 [local] [unknown] FATAL: role "pgsql" does not
exist
May 21 21:36:58 d8 postgres[54564]: [3-1] 2015-05-21 21:36:58.080 CEST
@ 54564 LOG: restartpoint starting: time
-- And everything goes OK.

--
Piotr Gasidło

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Piotr Gasidło 2015-05-22 08:12:51 Re: Strange replication problem - segment restored from archive but still requested from master
Previous Message Fujii Masao 2015-05-22 04:55:40 Re: Strange replication problem - segment restored from archive but still requested from master

Browse pgsql-hackers by date

  From Date Subject
Next Message Piotr Gasidło 2015-05-22 08:12:51 Re: Strange replication problem - segment restored from archive but still requested from master
Previous Message Heikki Linnakangas 2015-05-22 07:22:07 Re: Add a hint when postgresql.conf hot_standby = 'off' and recovery.conf standby = 'on'