Re: Very long standby database startup after doing pg_basebackup

From: Mario Mahovlić <mariomahovlic(at)gmail(dot)com>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: Very long standby database startup after doing pg_basebackup
Date: 2018-03-23 09:07:11
Message-ID: CAG9x_sTTariBA2MQCcQrfqm=RhzXUOy2=hN2wZcnR5u19xXVzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hello Laurenz,

Thank you for the reply, we will try using perf to see what exactly is
going on.
As for the log file all we get from it (even when log level is set to
debug5) is this:

>>> Starting postgres...
> LOG: database system was interrupted; last known up at 2018-03-20
> 08:25:18 UTC
> FATAL: the database system is starting up
> psql: FATAL: the database system is starting up
> >>> Db replication is still not accessible on *NODE* (trying 17255 times)
>
*..... * This goes on for 6 hours * *

LOG: entering standby mode
> LOG: redo starts at 4B/A000028
> LOG: consistent recovery state reached at 4B/A000130
> LOG: database system is ready to accept read only connections
> LOG: started streaming WAL from primary at 4B/B000000 on timeline 35
> DEBUG: CommitTransaction
> DEBUG: name: unnamed; blockState: STARTED; state: INPROGR,
> xid/subid/cid: 0/1/0, nestlvl: 1, children:
> DEBUG: StartTransactionCommand
> DEBUG: StartTransaction

...

Lines with >>> are from our own scripts we use for startup and database
management.

So basically on startup all we get in logs is:
LOG: database system was interrupted; last known up at 2018-03-20 08:25:18
UTC
FATAL: the database system is starting up -> every time we query db until
it starts up successfully

Is there any other way to log more detail during postgres startup?

As i said in initial mail we use -X stream option when doing
pg_basebackup, here is exact pg_basebackup command that is being run:

INFO: executing: '/usr/lib/postgresql/9.6/bin/pg_basebackup -l "repmgr base
> backup" -D /var/lib/postgresql/data -h *NODE_IP* -p *PORT* -U *USER* -c
> fast -X stream '

So yes, we do copy the WAL files, and there is at most 30GB of them (that
is how much we keep), however even if database had to restore all of them 6
hours seems a bit much.

Thank you for the help, kind regards.

On 22 March 2018 at 08:55, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> wrote:

> Mario Mahovlić wrote:
> > However, our issue is not with primary cloning, but with database
> startup after
> > clone/basebackup is done.
> > Startup of standby database is successful but it takes around 6 hours,
> during
> > which time all we get from DB is: FATAL: the database system is
> starting up.
> > We have noticed that this startup process uses only 1 cpu core on our
> machine
> > which has 8.
> >
> > Our database size is around 570GB, and we are using schema per tenant
> data
> > distribution, so we have separate schemas for each of the users of our
> > application, as it suits our needs well.
> > Because of this our database has about 40000 schemas with 5 tables per
> each
> > schema.
> > We suspect the reason behind this long startup is amount of data and/or
> large
> > number of schemas/tables on database, however we would like to confirm
> that somehow.
> >
> > So our questions are following:
> > What could be possible reasons for startup as long as this?
> > How can we figure out what exactly does postgres startup process do all
> this time,
> > is there any documentation on what exactly does postgres database do on
> startup as
> > we have not found any info on this?
> > Is there any way to speed up this startup process?
> > Can we mitigate this startup time in cases when we do planned switchover
> for
> > maintenance by using pg_rewind instead of pulling all data from primary
> again?
> > So if we shut down the database gracefully and use pg_rewind will
> startup process
> > be faster?
>
> You could use "perf top" to see where the time is spent or have
> a look at the PostgeSQL log file.
>
> pg_basebackup creates a checkpoint when it starts, so checkpoints that
> happen
> too rarely cannot be the problem.
>
> Did you use "pg_basebackup -X stream" to copy the WAL files along with the
> data?
> If yes, is there a lot of them?
> If no, how do you restore the required WAL files?
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Laurenz Albe 2018-03-23 09:42:47 Re: Very long standby database startup after doing pg_basebackup
Previous Message Heinemann, Manfred (IMS) 2018-03-22 19:30:56 RE: Function search_path