Re: Streaming replication - 11.5

From: Nicola Contu <nicola(dot)contu(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Streaming replication - 11.5
Date: 2020-03-11 21:12:35
Message-ID: CAMTZZh0LwQ5yYZqPTVTXbn42bQPJ=dcQGVmC_JHV4jkhmJ41yQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

CPU load on the server to be built? No.
System logs don't show anything relevant unfortunately

Il mer 11 mar 2020, 21:34 Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> ha
scritto:

> On 3/11/20 11:59 AM, Nicola Contu wrote:
> > I am actually cascading.
> > The master is in nyh, the first slave is in Dallas and the one having
> > problems is in Dallas as well on the same switch of the one replicating
> > from the master.
> >
> > It always worked not sure what is wrong now. We just encrypted disks on
> > all servers
>
> Do you have before and after on CPU load, I/O throughput?
>
> Do system logs show anything relevant during replication drop out?
>
> >
> >
> > Il mer 11 mar 2020, 18:57 Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com>> ha scritto:
> >
> > On 3/11/20 2:54 AM, Nicola Contu wrote:
> > > These are the lines before
> > >
> > > 2020-03-11 09:05:08 GMT [127.0.0.1(40214)] [43853]: [1-1]
> > > db=cmdv3,user=zabbix_check ERROR: recovery is in progress
> > > 2020-03-11 09:05:08 GMT [127.0.0.1(40214)] [43853]: [2-1]
> > > db=cmdv3,user=zabbix_check HINT: WAL control functions cannot be
> > > executed during recovery.
> > > 2020-03-11 09:05:08 GMT [127.0.0.1(40214)] [43853]: [3-1]
> > > db=cmdv3,user=zabbix_check STATEMENT: select
> > > greatest(0,pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) from
> > > pg_stat_replication where client_addr ='10.150.20.22'
> > >
> > > That query is made by Zabbix. So I stopped the zabbix agent and
> > tested
> > > again. But still failing, because of this now :
> > >
> > > pg_basebackup: starting background WAL receiver
> > > pg_basebackup: created temporary replication slot
> > "pg_basebackup_51199"
> > > *pg_basebackup: could not receive data from WAL stream: SSL
> SYSCALL
> > > error: EOF detected
> > > *^C4699810/504983062 kB (70%), 0/1 tablespace
> > > (...ql11/data/base/16401/231363544.2)
> >
> > So you started over with a pg_basebackup?
> >
> > Also from below:
> >
> > 2020-03-11 09:43:53 GMT [10.150.20.22(54906)] [51199]: [1-1]
> > db=[unknown],user=replicator LOG: terminating walsender process due
> to
> > replication timeout
> >
> > Where are the master and standby in relation to each other network
> wise?
> >
> > Intervening firewalls, network latency issues?
> >
> >
> >
> > >
> > >
> > > here the full log starting right before the last try :
> > >
> > > 2020-03-11 09:22:44 GMT [] [12598]: [4508-1] db=,user= LOG:
> > > restartpoint complete: wrote 19565 buffers (0.2%); 0 WAL file(s)
> > > added, 0 removed, 7 recycled; write=270.014 s, sync=0.009 s,
> > > total=270.036 s; sync files=804, longest=0.001 s, average=0.000 s;
> > > distance=131239 kB, estimate=725998 kB
> > > 2020-03-11 09:22:44 GMT [] [12598]: [4509-1] db=,user= LOG:
> > recovery
> > > restart point at 643A/D8C05F70
> > > 2020-03-11 09:22:44 GMT [] [12598]: [4510-1] db=,user= DETAIL:
> Last
> > > completed transaction was at log time 2020-03-11
> 09:22:44.050084+00.
> > > 2020-03-11 09:23:14 GMT [] [12598]: [4511-1] db=,user= LOG:
> > > restartpoint starting: time
> > > 2020-03-11 09:27:44 GMT [] [12598]: [4512-1] db=,user= LOG:
> > > restartpoint complete: wrote 17069 buffers (0.2%); 0 WAL file(s)
> > > added, 0 removed, 17 recycled; write=269.879 s, sync=0.006 s,
> > > total=269.902 s; sync files=811, longest=0.001 s, average=0.000 s;
> > > distance=120469 kB, estimate=665445 kB
> > > 2020-03-11 09:27:44 GMT [] [12598]: [4513-1] db=,user= LOG:
> > recovery
> > > restart point at 643A/E01AB438
> > > 2020-03-11 09:27:44 GMT [] [12598]: [4514-1] db=,user= DETAIL:
> Last
> > > completed transaction was at log time 2020-03-11
> 09:27:43.945485+00.
> > > 2020-03-11 09:27:44 GMT [] [12598]: [4515-1] db=,user= LOG:
> > > restartpoint starting: force wait
> > > 2020-03-11 09:29:24 GMT [10.222.8.2(47834)] [50961]: [1-1]
> > > db=cmdv3,user=nis LOG: duration: 1402.004 ms statement: SELECT
> id,
> > > name, parent_id, parent, short_name, sales_rep_id FROM
> mmx_clients;
> > > 2020-03-11 09:29:34 GMT [10.222.8.2(47834)] [50961]: [2-1]
> > > db=cmdv3,user=nis LOG: duration: 9493.259 ms statement: SELECT
> > slid,
> > > gnid, sof_id, client_id, product FROM mmx_slids;
> > > 2020-03-11 09:32:14 GMT [] [12598]: [4516-1] db=,user= LOG:
> > > restartpoint complete: wrote 71260 buffers (0.8%); 0 WAL file(s)
> > > added, 0 removed, 13 recycled; write=269.953 s, sync=0.012 s,
> > > total=269.979 s; sync files=760, longest=0.002 s, average=0.000 s;
> > > distance=123412 kB, estimate=611242 kB
> > > 2020-03-11 09:32:14 GMT [] [12598]: [4517-1] db=,user= LOG:
> > recovery
> > > restart point at 643A/E7A30498
> > > 2020-03-11 09:32:14 GMT [] [12598]: [4518-1] db=,user= DETAIL:
> Last
> > > completed transaction was at log time 2020-03-11
> 09:32:13.916101+00.
> > > 2020-03-11 09:32:44 GMT [] [12598]: [4519-1] db=,user= LOG:
> > > restartpoint starting: time
> > > 2020-03-11 09:37:14 GMT [] [12598]: [4520-1] db=,user= LOG:
> > > restartpoint complete: wrote 27130 buffers (0.3%); 0 WAL file(s)
> > > added, 0 removed, 12 recycled; write=270.026 s, sync=0.007 s,
> > > total=270.052 s; sync files=814, longest=0.001 s, average=0.000 s;
> > > distance=280595 kB, estimate=578177 kB
> > > 2020-03-11 09:37:14 GMT [] [12598]: [4521-1] db=,user= LOG:
> > recovery
> > > restart point at 643A/F8C351C8
> > > 2020-03-11 09:37:14 GMT [] [12598]: [4522-1] db=,user= DETAIL:
> Last
> > > completed transaction was at log time 2020-03-11
> 09:37:14.067443+00.
> > > 2020-03-11 09:37:44 GMT [] [12598]: [4523-1] db=,user= LOG:
> > > restartpoint starting: time
> > > 2020-03-11 09:42:14 GMT [] [12598]: [4524-1] db=,user= LOG:
> > > restartpoint complete: wrote 26040 buffers (0.3%); 0 WAL file(s)
> > > added, 0 removed, 9 recycled; write=269.850 s, sync=0.019 s,
> > > total=269.886 s; sync files=834, longest=0.002 s, average=0.000 s;
> > > distance=236392 kB, estimate=543999 kB
> > > 2020-03-11 09:42:14 GMT [] [12598]: [4525-1] db=,user= LOG:
> > recovery
> > > restart point at 643B/730F3F8
> > > 2020-03-11 09:42:14 GMT [] [12598]: [4526-1] db=,user= DETAIL:
> Last
> > > completed transaction was at log time 2020-03-11
> 09:42:13.900088+00.
> > > 2020-03-11 09:42:44 GMT [] [12598]: [4527-1] db=,user= LOG:
> > > restartpoint starting: time
> > > 2020-03-11 09:43:53 GMT [10.150.20.22(54906)] [51199]: [1-1]
> > > db=[unknown],user=replicator LOG: terminating walsender process
> > due to
> > > replication timeout
> > > 2020-03-11 09:47:14 GMT [] [12598]: [4528-1] db=,user= LOG:
> > > restartpoint complete: wrote 20966 buffers (0.2%); 0 WAL file(s)
> > > added, 0 removed, 9 recycled; write=270.048 s, sync=0.014 s,
> > > total=270.085 s; sync files=852, longest=0.001 s, average=0.000 s;
> > > distance=183749 kB, estimate=507974 kB
> > > 2020-03-11 09:47:14 GMT [] [12598]: [4529-1] db=,user= LOG:
> > recovery
> > > restart point at 643B/12680A80
> > > 2020-03-11 09:47:14 GMT [] [12598]: [4530-1] db=,user= DETAIL:
> Last
> > > completed transaction was at log time 2020-03-11
> 09:47:14.069731+00.
> > > 2020-03-11 09:47:44 GMT [] [12598]: [4531-1] db=,user= LOG:
> > > restartpoint starting: time
> > >
> > >
> > >
> > > Il giorno mer 11 mar 2020 alle ore 01:53 Adrian Klaver
> > > <adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com>>> ha scritto:
> > >
> > > On 3/10/20 8:17 AM, Nicola Contu wrote:
> > > Please post to list also.
> > > Ccing list.
> > >
> > > What came immediately before the temporary file error?
> > >
> > > > 2020-03-10 15:10:17 GMT [[local]] [28171]: [1-1]
> > > > db=postgres,user=postgres LOG: temporary file: path
> > > > "base/pgsql_tmp/pgsql_tmp28171.0", size 382474936
> > > > 2020-03-10 15:10:17 GMT [[local]] [28171]: [4-1]
> > > > db=postgres,user=postgres LOG: could not send data to
> client:
> > > Broken pipe
> > > > 2020-03-10 15:10:17 GMT [[local]] [28171]: [5-1]
> > > > db=postgres,user=postgres FATAL: connection to client lost
> > > > 2020-03-10 15:10:26 GMT [] [12598]: [3544-1] db=,user= LOG:
> > > > restartpoint complete: wrote 37315 buffers (0.4%); 0 WAL
> > file(s)
> > > > added, 0 removed, 16 recycled; write=269.943 s, sync=0.039
> s,
> > > > total=269.999 s; sync files=1010, longest=0.001 s,
> > average=0.000 s;
> > > > distance=175940 kB, estimate=416149 kB
> > > > 2020-03-10 15:10:26 GMT [] [12598]: [3545-1] db=,user= LOG:
> > > recovery
> > > > restart point at 6424/1D7DEDE8
> > > >
> > > > It is a cascade replication
> > > >
> > > > Il giorno mar 10 mar 2020 alle ore 15:58 Adrian Klaver
> > > > <adrian(dot)klaver(at)aklaver(dot)com
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com> <mailto:adrian(dot)klaver(at)aklaver(dot)com
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com>>
> > > <mailto:adrian(dot)klaver(at)aklaver(dot)com
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com>
> > > <mailto:adrian(dot)klaver(at)aklaver(dot)com
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com>>>> ha scritto:
> > > >
> > > > On 3/10/20 2:26 AM, Nicola Contu wrote:
> > > > > Hello,
> > > > > I have two servers connected to the same switch
> running
> > > postgres 11.5
> > > > >
> > > > > I am trying to replicate one of those servers after
> > a planned
> > > > work on
> > > > > the master, so the replica has been lost. It has
> > always worked
> > > > but now I
> > > > > get this :
> > > > >
> > > > > pg_basebackup: could not receive data from WAL
> > stream: server
> > > > closed the
> > > > > connection unexpectedly
> > > > > This probably means the server terminated
> > abnormally
> > > > > before or while processing the request.
> > > > >
> > > > > I don't really understand what the issue is.
> > > >
> > > > I would start with the logs from the Postgres server
> > you are
> > > taking the
> > > > backup from.
> > > >
> > > > > I had this issue last week as well in another DC
> > and I had to
> > > > reboot the
> > > > > slave to make it working (not sure why it helped)
> > > > >
> > > > > Do you know what can cause this?
> > > > >
> > > > > Thank you,
> > > > > Nicola
> > > >
> > > >
> > > > --
> > > > Adrian Klaver
> > > > adrian(dot)klaver(at)aklaver(dot)com
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com> <mailto:adrian(dot)klaver(at)aklaver(dot)com
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com>>
> > > <mailto:adrian(dot)klaver(at)aklaver(dot)com
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com> <mailto:adrian(dot)klaver(at)aklaver(dot)com
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com>>>
> > > >
> > >
> > >
> > > --
> > > Adrian Klaver
> > > adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com
> >>
> > >
> >
> >
> > --
> > Adrian Klaver
> > adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
> >
>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Darren Lafreniere 2020-03-11 21:12:53 Re: Querying an index's btree version
Previous Message David Rowley 2020-03-11 21:09:47 Re: Query to retrieve the index columns when a function is used.