Re: WAL accumulating, Logical Replication pg 13

From: Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com>
To: Willy-Bas Loos <willybas(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: WAL accumulating, Logical Replication pg 13
Date: 2021-05-28 20:20:14
Message-ID: CAM+6J96NsD1PN+_TypHRUM3eft+uksep3mAgmeSqjEG6ocBGFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I am not too sure with 9.3
i tried an upgrade from 9.6 to 11 using logical replication (pg_logical
extension)

one thing to note.
logical replication initiates a copy from a snapshot, then changes from
then on.

I had a very high insert rate on my source tables (v9.6) and the
destination (v11) could not keep up (it had tons of indexes when I copied
the schema) and it took around a day as the table had around 12 indexes.

So at the destination(v11), I dropped all but the primary index for each
table, started subscription and when it was almost caught up, rebuilt the
index on the destination concurrently.
it completed in 4-5 hours without stopping the source.
migration completed in a few mins :)

not sure if this would help, but just FYI.

On Sat, 29 May 2021 at 01:36, Willy-Bas Loos <willybas(at)gmail(dot)com> wrote:

> Hi , I'm upgrading a 1.5TB database from postgres 9.3 to postgres 13 on
> Debian 10. This is now in an Acceptance stage (DTAP). I have encountered a
> problem: the WAL is not being deleted. I now have 1.4 TB of WAL in pg_wal
> and my disks are getting full. The oldest WAL file is 18 days old.
> I use Logical Replication from the new cluster to another new cluster with
> 1 subscriber and 1 subscription.
>
> pg_stat_subscription tells me all recent timestamps.
> and this:
> db=# select * from pg_replication_slots;
> -[ RECORD 1 ]-------+-------------
> slot_name | my_pub1
> plugin | pgoutput
> slot_type | logical
> datoid | 16401
> database | db
> temporary | f
> active | t
> active_pid | 9480
> xmin |
> catalog_xmin | 269168
> restart_lsn | D4/908BC268
> confirmed_flush_lsn | E1/25BF5710
> wal_status | extended
> safe_wal_size |
>
>
>
> I've had problems with diskspace on this server, with postgres crashing
> because of it, then added more diskspace and postgres recovered. This
> doesn't seem to be a problem now.
>
> The *publication* has the options publish = 'insert, update, delete,
> truncate', publish_via_partition_root = false
> The *subscription* has the options connect = true, enabled = true,
> create_slot = false, slot_name = my_pub1, synchronous_commit = 'off'
>
> The log on the publisher says:
> 2021-05-25 21:25:18.973 CEST [4584] user(at)db LOG: starting logical
> decoding for slot "my_pub1"
> 2021-05-25 21:25:18.973 CEST [4584] user(at)db DETAIL: Streaming
> transactions committing after D6/A82B5FE0, reading WAL from D4/908BC268.
> 2021-05-25 21:25:18.973 CEST [4584] user(at)db LOG: logical decoding found
> consistent point at D4/908BC268
> 2021-05-25 21:25:18.973 CEST [4584] user(at)db DETAIL: There are no running
> transactions.
> 2021-05-25 21:29:49.456 CEST [4614] user(at)db ERROR: replication slot
> "my_pub1" is active for PID 4584
> 2021-05-25 21:29:54.474 CEST [4615] user(at)db ERROR: replication slot
> "my_pub1" is active for PID 4584
>
> And on the subscriber:
> 2021-05-28 21:23:46.702 CEST [40039] LOG: logical replication apply
> worker for subscription "my_pub1" has started
> 2021-05-28 21:23:46.712 CEST [40039] ERROR: could not start WAL
> streaming: ERROR: replication slot "my_pub1" is active for PID 730
> 2021-05-28 21:23:46.714 CEST [19794] LOG: background worker "logical
> replication worker" (PID 40039) exited with exit code 1
>
> The postgres settings on the *publisher* are:
> max_connections = 100 # (change requires restart)
> tcp_keepalives_idle = 120 # TCP_KEEPIDLE, in seconds;
> shared_buffers = 50GB # min 128kB
> work_mem = 1GB # min 64kB
> maintenance_work_mem = 10GB # min 1MB
> logical_decoding_work_mem = 5GB # min 64kB
> dynamic_shared_memory_type = posix # the default is the first option
> max_worker_processes = 20 # (change requires restart)
> max_parallel_maintenance_workers = 10 # taken from max_parallel_workers
> max_parallel_workers_per_gather = 5 # taken from max_parallel_workers
> max_parallel_workers = 15 # maximum number of max_worker_processes that
> wal_level = logical # minimal, replica, or logical
> max_wal_size = 1GB
> min_wal_size = 80MB
> #archive_mode = off
> max_wal_senders = 10 # max number of walsender processes
> wal_sender_timeout = 1min # in milliseconds; 0 disables
> max_replication_slots = 7 # max number of replication slots
>
> On postgres settings on the *subscriber*:
> max_connections = 100 # (change requires restart)
> tcp_keepalives_idle = 120 # TCP_KEEPIDLE, in seconds;
> shared_buffers = 25GB # min 128kB
> work_mem = 1GB # min 64kB
> maintenance_work_mem = 10GB # min 1MB
> logical_decoding_work_mem = 5GB # min 64kB
> dynamic_shared_memory_type = posix # the default is the first option
> max_worker_processes = 20 # (change requires restart)
> max_parallel_maintenance_workers = 10 # taken from max_parallel_workers
> max_parallel_workers_per_gather = 5 # taken from max_parallel_workers
> max_parallel_workers = 15 # maximum number of max_worker_processes that
> wal_level = logical # minimal, replica, or logical
> max_wal_size = 3GB
> min_wal_size = 80MB
> #archive_mode = off
> wal_receiver_timeout = 1min # time that receiver waits for
> max_logical_replication_workers = 10 # taken from max_worker_processes
> max_sync_workers_per_subscription = 5 # taken from
> max_logical_replication_workers
>
> I've tried increasing wal_sender_timeout and wal_receiver_timeout to 10
> minutes each, but this had no positive effect.
>
> Some advice would be helpful
> --
> Willy-Bas Loos
>

--
Thanks,
Vijay
Mumbai, India

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dean Gibson (DB Administrator) 2021-05-28 22:06:10 Re: AWS forcing PG upgrade from v9.6 a disaster
Previous Message Willy-Bas Loos 2021-05-28 20:05:58 WAL accumulating, Logical Replication pg 13