Re: Performance penalty during logical postgres replication

From: Michael Lewis <mlewis(at)entrata(dot)com>
To: Lars Vonk <lars(dot)vonk(at)gmail(dot)com>
Cc: PostgreSQL General <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Performance penalty during logical postgres replication
Date: 2020-12-09 17:26:25
Message-ID: CAHOFxGr7BueMgqBHgLB41d2hUdZi5A8Dhw47vo=dOcgeWs4umw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Dec 9, 2020 at 2:21 AM Lars Vonk <lars(dot)vonk(at)gmail(dot)com> wrote:

> Hi,
>
> We are doing a logical postgres replication from Postgres 11 to 12. Our
> database is around 700GB (8 cpu's, 32 GB).
> During the replication process, at some point, we see a huge performance
> penalty on a particular table. This table acts as a queue with lots of
> inserts and deletes happening throughout the day. For most of the time this
> table is empty, but during this performance penalty the number of rows in
> this table grows to 10.000 rows, and processing is not fast enough to empty
> this table. Main reason for this (as far as we see) is that the performance
> of the query for selecting the next row to process drops from < 10MS to
> 400MS. This eventually causes too much cpu load on the Primary and we have
> to cancel the replication process.
> We already tried the initial load three times, and it consistently fails
> with the same "error". Last try was a per table approach and excluding this
> "queue" table.
> After cancelling the replication the query is fast again and the load on
> the Primary goes back to normal. We see that this happens when replicating
> large tables (> millions of rows). During this performance penalty the
> explain of the query selecting the next row from this table tells us it is
> doing a sequential scan (there is an index but it is not used).
>
> - What could cause this performance penalty?
> - Is this something other people experienced as well during the initial
> load of a logical replication with large tables?
> - We are now thinking of temporarily increasing the number of CPU's and
> RAM for the migration. Would this help in this case?
>
> Thanks in advance,
> Lars
>

I would guess that you are hitting bad plans (sequential scan instead of
index) because auto-vacuum is not picking up this table for vacuum/analyze
as often as needed. You could try a periodic 'vacuum analyze table' while
the logical replication is going, or just look at your autovacuum settings
and tune the system to be more aggressive with turning down scale factor
and cost delay. If the table is primarily "insert only" particularly with
this initial load of data, then the autovacuum may not pick it up at all.
That pain point is mitigated with PG13's new behavior to kick off
autovacuum/analyze based on inserts also.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Michael Lewis 2020-12-09 17:27:31 Re: User input to queries
Previous Message Rich Shepard 2020-12-09 17:24:37 Re: User input to queries