Re: PostgreSQL logical replication depends on WAL segments?

From: Jeremy Finzel <finzelj(at)gmail(dot)com>
To: Josef Machytka <josef(dot)machytka(at)gmail(dot)com>
Cc: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: PostgreSQL logical replication depends on WAL segments?
Date: 2019-01-22 17:10:27
Message-ID: CAMa1XUhr1v4QDqVgRRKNYB_vhrnNr2Y7wb0nu2RUxSS4Kx9s8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>
> Thanks, I see... So if I understand it correctly - since I have quite big
> partitions like ~30 GB each in one parent table and from ~1GB to ~5 GB in
> several others I presume I had to set wal_keep_segments to some really high
> number and stop our security cronjob cleaning old WAL segments (because we
> already had some problems with almost full disk due to old WAL segments)
> until the whole transfer of snapshot is done. Because only after the whole
> snapshot is transferred logical replication workers start to transfer WAL
> logs reflecting changes done from the moment snapshot was taken...
>
> jm
>

Understand there are other downsides to just keeping around a huge amount
of WAL segments apart from only taking up disk space. None of the data
held in those WAL segments can be vacuumed away while they are left around,
which can lead to significant bloat and performance issues over time.

I'm not exactly clear on your use case, but if you need to just
resychronize data for a single table, there is a built-in way to do that
(actually would be nice if the docs spelled this out).

On publisher:

ALTER PUBLICATION mypub DROP TABLE old_data_table;

On subscriber:

ALTER SUBSCRIPTION mysub REFRESH PUBLICATION WITH ( COPY_DATA = true);

On publisher:

ALTER PUBLICATION mypub ADD TABLE old_data_table;

On subscriber:

ALTER SUBSCRIPTION mysub REFRESH PUBLICATION WITH ( COPY_DATA = true);

The last command will resync the table from the current table data,
regardless of the WAL file situation. This is the "normal" way you would
go about resynchronizing data between clusters when a long time has passed,
rather than trying to keep all that WAL around!

So far as I can tell from testing, above pattern is the easiest way to do
this, and it will not resynchronize any of the other tables in your
subscription.

P.S. do heed the advice of the others and get more familiar with the docs
around WAL archiving.

Thanks,
Jeremy

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2019-01-22 17:53:26 Re: PostgreSQL logical replication depends on WAL segments?
Previous Message Alvaro Herrera 2019-01-22 16:47:37 Re: Decrease time needed to CREATE INDEX and FOREIGN KEY on new table column which has all values NULL