From: | Mladen Marinović <mladen(dot)marinovic(at)kset(dot)org> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: pg_basebackup connection closed unexpectedly... |
Date: | 2020-02-13 09:41:32 |
Message-ID: | CAHjkqPRwHXw56GYAPO65ZTQsdFXG7YQ2nJX2LFOhyq-A3eF9wg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Wed, Feb 12, 2020 at 4:09 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> =?UTF-8?Q?Mladen_Marinovi=C4=87?= <mladen(dot)marinovic(at)kset(dot)org> writes:
> > Recently I am having some strange problems with pg_basebackup. About
> once a
> > week the backup process ends with an error message like this:
> > 2020-02-11 23:25:40 UTC [25790]: [1-1] user=replicator,db=[unknown] LOG:
> > could not send data to client: Connection reset by peer
>
> Hmmm ....
>
> > The problem started occurring after a hardware (RAM + SSD) upgrade and an
> > OS Upgrade to Ubuntu 18.04. Both the server and backup process run in
> > separate docker containers on the same machine. This happens randomly on
> > multiple servers with the same configuration and it is probably not
> > hardware related. Also, this happens evenly on 9.4 and 9.6, and using the
> > same docker images that worked flawlessly on the previous installation.
> > I have been investigating the issue for at least a month and found no
> > problems in any log or metric before or after the event. I suspect that
> > this is related to some OS/docker parameter that is not well configured.
>
> How long does the backup run before failing? If the connection were going
> between different machines my suspicions would lean toward a network
> timeout. That seems somewhat unlikely in this configuration, but you
> never know.
>
The backup started at 23:00, and it copied 363GB by the time the connection
was closed. It usually takes about 2 hours for the entire database (cca.
1.1TB). I was also thinking that the problem could be network related, but
the network is a virtual docker bridge network on a single machine, and the
backup is usually ok. If it failed during other operations (as this is a
production database) or during every backup it would be easier to see what
the problem could be, but this is really annoyingly random.
>
> > Would increasing the database log level give me any more info about what
> > caused the connection to close?
>
> Nope, not directly. It might be useful to figure out whether data
> transfer continues full throttle right up until the connection drop,
> or whether it stops sooner (and then there's some sort of timeout
> before the error occurs).
>
I can see that pg_basebackup has a verbose switch, but I am not sure it
will report the stuff you mention. On the database, the log levels
currently are:
client_min_messages = notice
log_min_messages = warning
log_min_error_statement = error
I assume that I should change the first two to at least debug1 to see
something.
> regards, tom lane
>
Regards,
Mladen Marinović
From | Date | Subject | |
---|---|---|---|
Next Message | Jason Ralph | 2020-02-13 12:31:42 | pg_upgrade —link does it remove table bloat |
Previous Message | Michael Paquier | 2020-02-13 03:49:46 | Re: JIT on Windows with Postgres 12.1 |