Re: BUG #17524: Increase in WAL size due to logical replication with publication contain a table with low activity.

From: Raman Kumar <raman(dot)kumar(at)r-indventures(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17524: Increase in WAL size due to logical replication with publication contain a table with low activity.
Date: 2022-06-23 06:33:09
Message-ID: CAG8FmLVDgxkyBPz4d6t9yy63EvzVjcN0Y1nzyhpyV7H3OkkEmg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

We are using a debezium postgres kafka connector, it does handle the empty
transaction as far as i know.

About the large transactions, no there aren't any as most of the time the
disk usage spikes when the activity is really low.

I am using AWS RDS to host the postgres instance, so I don't really have an
idea on how to check the data/pg_replslot directory.

In order to mitigate the issue temporarily, We have added a dummy table in
the publication on which we are inserting random. This keeps the
replication lag under control and avoid filling up the disk.

On Tue, Jun 21, 2022 at 7:08 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> On Mon, Jun 20, 2022 at 7:07 PM PG Bug reporting form
> <noreply(at)postgresql(dot)org> wrote:
> >
> > The following bug has been logged on the website:
> >
> > Bug reference: 17524
> > Logged by: Raman Kumar
> > Email address: raman(dot)kumar(at)r-indventures(dot)com
> > PostgreSQL version: 13.6
> > Operating system: Ubuntu
> > Description:
> >
> > We are using logical replication to source the event from postgres. and
> > publication includes a single table with (Insert, update, delete and
> > truncate ).
> > Everything works great, Until we run into a strange issue i.e. Increase
> is
> > the disk usage, abnormally.
> > On looking further we have found that some of the disk space occupy by
> the
> > wal.
> > Using this query
> > ```
> > Select slot_name, pg_current_wal_lsn(), confirmed_flush_lsn,
> > (pg_current_wal_lsn() - confirmed_flush_lsn) as lag from
> > pg_replication_slots;
> > ```
> > ```
> > "slot_name" "pg_current_wal_lsn" "confirmed_flush_lsn" "lsn_distance"
> > "poc_kafka_connect" "109C/AC001020" "109C/94909990" 393180816
> > ```
> >
> > ```
> > select usename, client_addr, pg_current_wal_lsn(), flush_lsn,
> > (pg_current_wal_lsn() - flush_lsn) as lag from pg_stat_replication;
> > ```
> >
> > ```
> > "usename" "client_addr" "pg_current_wal_lsn" "flush_lsn" "lag"
> > "postgres" "x.x.x.x" "109C/AC0011A0" "109C/94909990" 393181200
> > ```
> >
> > We have notice the replication lag. But as the publication includes on
> the
> > single table, the last changes are already replicated/published.
> > So Technically there is no replication lag as per publication.
> >
>
> The lag is computed based on the difference between the WAL location
> written in the publisher and the WAL location confirmed by the
> subscriber as replayed. Normally for non-published data, the latest
> WAL location is confirmed by the subscriber either because we send an
> empty transaction (pair of BEGIN ... COMMIT commands) or via
> keep-alive messages. Now, it is not clear why none of those is working
> in your case? Are there any large transactions happening on the
> non-published tables? You can once check data/pg_replslot directory to
> confirm if there are any spill files present there?
>
> --
> With Regards,
> Amit Kapila.
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Andrey Borodin 2022-06-23 12:39:03 Re: BUG #17528: ERROR: could not access status of transaction 1997627701
Previous Message Michael Paquier 2022-06-23 06:00:00 Re: BUG #17522: While using --with-ssl=openssl and PG_TEST_EXTRA='ssl' options, SSL test fails on OpenBSD 7.1