Logical replication & corrupted pages recovery

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Logical replication & corrupted pages recovery
Date: 2017-05-26 09:23:56
Message-ID: c403f9b1-80fa-2bcf-da31-4d180aa65aae@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Several PgPro cstomers, which are evaluating our multimaster, are
interested in possibility to recover corrupted pages from other cluster
nodes.
This task seems to be more general and is not multimaster specific. This
is why I want to discuss it here.

With physical (streaming) replication content of master and replica
database are identical, so it is quite easy do restore corrupted page
from the replica by just copying correspondent file or part of file.
With logical replication content of database pages on the disk may be
different even through data is logically identical.
If some heap page is corrupted, then there is no some simple and
efficient way to determine records which were located on this page.
Clustered indexes can help, but this is a long story...

So my question is whether there is now some efficient way to synchronize
two tables?
If not, are there any plans to provide such functionality in logical
replication in future?

Right now, the only approach which comes to me mind is to extract all
primary keys at two nodes, exchanges them between nodes, find out
missing tuples by comparing two ordered set of keys and request them
from other node. It is based on the assumption that Postgres just skips
records from the corrupted pages.
The drawback of this approach is that it will be very slow and cause
large network traffic for huge tables.
May be it is possible to somehow optimize it, by checking ranges of
primary key values
(if number of records in the range is the same at both nodes, then
ranges can be considered as identical and not compared).

Also this approach requires suspending of cluster while table
synchronization (or at least, locking this table).
Synchronization of table in case of presence of active updates of this
tables seems to be much more challenged task.

If somebody has already thought about this problem, have some plan or
may be even ready solution for it, please share your thoughts.
Thanks in advance,

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tushar 2017-05-26 12:17:01 Re: logical replication - still unstable after all these months
Previous Message Erik Rijkers 2017-05-26 08:45:33 Re: logical replication - still unstable after all these months