From: | Koen De Groote <kdg(dot)dev(at)gmail(dot)com> |
---|---|
To: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> |
Cc: | PostgreSQL General <pgsql-general(at)lists(dot)postgresql(dot)org> |
Subject: | Re: In case of network issues, how long before archive_command does retries |
Date: | 2022-05-20 07:37:00 |
Message-ID: | CAGbX52Ge9nppVjVJOT3SKZVQqLOpzOAsaX4=Nh+MLatksQNYwg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Thank you for your thorough explanation.
On Thu, May 19, 2022 at 5:47 PM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
wrote:
> On Thu, 2022-05-19 at 15:43 +0200, Koen De Groote wrote:
> > On Thu, May 19, 2022 at 9:10 AM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
> wrote:
> > > On Wed, 2022-05-18 at 22:51 +0200, Koen De Groote wrote:
> > > > When connection is gone or blocked, archive_command fails after the
> timeout specified
> > > > by the NFS mount, as expected. (for a soft mount. hard mount hangs,
> as expected)
> > > >
> > > > However, on restoring connection, it's not clear to me how long it
> takes before the command is retried.
> > > >
> > > > Experience says "a few minutes", but I can't find documentation on
> an exact algorithm.
> > > >
> > > > To be clear, the question is: if archive_command fails, what are the
> specifics of retrying?
> > > > Is there a timeout? How is that timeout defined?
> > > >
> > > > Is this detailed somewhere? Perhaps in the source code? I couldn't
> find it in the documentation.
> > > >
> > > > For detail, I'm using postgres 11, running on Ubuntu 20.
> > >
> > > You can find the details in "src/backend/postmaster/pgarch.c".
> > >
> > > The archiver will try to archive three times (NUM_ARCHIVE_RETRIES) in
> an interval
> > > of one second, then back off until it receives a signal, PostgreSQL
> shutd down
> > > or a minute has passed.
> >
> > Thanks for the reply. That would mean the source code is here:
> >
> https://github.com/postgres/postgres/blob/REL_11_0/src/backend/postmaster/pgarch.c
>
> For release 11.0, yes.
>
> > Just to be sure, the "signal" you speak of, this is the result of the
> command executed by archive_command?
>
> No, that is an operating system signal.
> PostgreSQL processes communicate by sending signals to each other, and if
> anybody
> wakes up the archiver, it will try again.
>
> > If my understanding of the code is right, if no SIGTERM or other signal
> arrives, it won't ever happen
> > that a walarchive is skipped if the archive_command fails too many times
> or takes too long? It
> > will simply check again every 60 seconds(PGARCH_AUTOWAKE_INTERVAL) ? Or
> is the 60 seconds the point
> > where it stops trying, waiting for the next time archive_command is
> invoked?
>
> Even if a signal arrives, PostgreSQL will keep trying to archive that same
> WAL segment
> that failed until it is done.
>
> This is a potential sequence of events:
>
> try to archive -> fail
> sleep 1 second
> try to archive -> fail
> sleep 1 second
> try to archive -> fail
> sleep 60 seconds
> try to archive -> fail
> sleep 1 second
> try to archive -> fail
> sleep 1 second
> try to archive -> fail
> sleep 60 seconds -> get woken up by a signal after 30 seconds
> try to archive -> fail
> sleep 1 second
> try to archive -> fail
> get shutdown request -> exit
>
> When PostgreSQL restarts, it will continue trying to archive the same
> segment.
>
> > I'm assuming that as long as the file is still in the pg_wal directory
> and as long as there is no
> > ".done" file for that walarchive under pg_wal/archive_status, it will
> keep trying forever(or until
> > someone forcefully switches the timeline with for instance a basebackup)?
>
> Yes, it will keep trying, and a timeline switch won't change that.
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>
From | Date | Subject | |
---|---|---|---|
Next Message | Kip Cole | 2022-05-21 23:38:45 | Casting a collation in an ORDER BY ... COLLATE |
Previous Message | Laurenz Albe | 2022-05-20 07:19:15 | Re: Long living and expiring locks? |