Large Data insert on Master server fills up /pgsql base dir when Logical replication (pglogical) is active

From: Siddhartha Gurijala <gurijala(at)nodalexchange(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Large Data insert on Master server fills up /pgsql base dir when Logical replication (pglogical) is active
Date: 2022-04-20 22:35:51
Message-ID: CACWx9F9ECoWPwaBJXAYoHsay-Re-Z1C4WhfUQwi_oj22s=oCfQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello Postgres community,
I have a database cluster that crashed in a way I don’t understand.
Some details about the setup:

- The database that crashed is running postgres 14.1
- This database has three physical standbys using repmgr
- The database allows another database in the same location to subscribe
to logical replication subscriptions
- The $PGDATA directory has 1.8TB total storage and was at 50% usage
before this issue occurred
- The pg_wal directory is symlinked to another partition that has 1.4Tb
of available space

Recently during a deployment, there was a single transaction that converted
many large tables from inheritance-based partitioned tables into
declarative partitioned tables. The estimate of the data moved in this
transaction is about 150GB of data. It should also be noted that all of
these tables are part of a replication set for logical replication.
The way the SQL in this transaction is written:

- BEGIN
- In a DO block, create unlogged temporary tables as select * from
(inheritance partitioned tables)
- In a DO block, drop the old inheritance partitioned tables
- Multiple CREATE TABLE statements to recreate these tables using
declarative partitioning
- DO blocks to create the individual partitions and indexes
- DO block to set proper permissions
- DO block that inserts the data from the unlogged temporary tables into
the base tables and drops the temporary tables
- END

All the SQL for this is being run as part of multiple DO pgplsql blocks
within a single transaction. In the original failure, the tables being
replaced with declarative partitions were part of the replication set with
an active subscriber.
I thought that maybe my issue is that I shouldn’t have any active
subscriptions that are replicating tables that will be dropped and
recreated within a transaction. So, I tried something different: with the
same active subscriber, I first removed the tables from the replication set
and then executed the transaction that replaces the inheritance tables with
the declarative versions. Even with the tables already removed from the
replication set, the $PGDATA/base directory keeps growing until the disk is
full.
This increase in the disk space for postgres data directory will stop when
I drop the pglogical replication sets and replication slots (in other words
just dropping the provider node).I have confirmed it’s not the pg_wal
that’s holding up space.
We did a similar migration last year (using identical SQL with different
table names) when our PG was on 10.12 and didn’t see this issue. Hence, I
am confused on why having an active replication slot but no target table
part of it will cause the postgres to fill up the base disk space. I have
also compared the schema and table sizes of these target tables from before
and after my sql run and there don’t seem to be any difference.
Postgres Version on Master database: 14.1
Postgres version on Subscriber database cluster: 14.1
Pglogical version on master: pglogical 2.4.0-1.rhel7
pglogical version on subscriber: pglogical 2.4.0-1.rhel7
I also tested this after upgrading pglogical to 2.4.1 and still found it to
have the same issue.

Thanks & Regards,
Siddhartha Gurijala

Browse pgsql-general by date

  From Date Subject
Next Message senor 2022-04-20 23:06:25 autovacuum_freeze_max_age on append-only tables
Previous Message Benedict Holland 2022-04-20 22:20:44 Re: Are stored procedures/triggers common in your industry