Re: Logical Replication WIP

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Steve Singer <steve(at)ssinger(dot)info>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Logical Replication WIP
Date: 2016-09-06 09:55:18
Message-ID: e48834c5-1db9-b381-3b7e-8f1ecb04dddd@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 05/09/16 23:35, Steve Singer wrote:
> On 09/05/2016 03:58 PM, Steve Singer wrote:
>> On 08/31/2016 04:51 PM, Petr Jelinek wrote:
>>> Hi,
>>>
>>> and one more version with bug fixes, improved code docs and couple
>>> more tests, some general cleanup and also rebased on current master
>>> for the start of CF.
>>>
>>>
>>>
>>
>
> A few more things I noticed when playing with the patches
>
> 1, Creating a subscription to yourself ends pretty badly,
> the 'CREATE SUBSCRIPTION' command seems to get stuck, and you can't kill
> it. The background process seems to be waiting for a transaction to
> commit (I assume the create subscription command). I had to kill -9 the
> various processes to get things to stop. Getting confused about
> hostnames and ports is a common operator error.
>

Hmm I guess there is missing interrupts check, will look. It would be
great to detect it properly but I am not really sure how to do that as
afaik there is no accurate way to detect that the connection is to yourself.

> 2. Failures during the initial subscription aren't recoverable
>
> For example
>
> on db1
> create table a(id serial4 primary key,b text);
> insert into a(b) values ('1');
> create publication testpub for table a;
>
> on db2
> create table a(id serial4 primary key,b text);
> insert into a(b) values ('1');
> create subscription testsub connection 'host=localhost port=5440
> dbname=test' publication testpub;
>
> I then get in my db2 log
>
> ERROR: duplicate key value violates unique constraint "a_pkey"
> DETAIL: Key (id)=(1) already exists.
> LOG: worker process: logical replication worker 16396 sync 16387 (PID
> 10583) exited with exit code 1
> LOG: logical replication sync for subscription testsub, table a started
> ERROR: could not crate replication slot "testsub_sync_a": ERROR:
> replication slot "testsub_sync_a" already exists
>
>
> LOG: worker process: logical replication worker 16396 sync 16387 (PID
> 10585) exited with exit code 1
> LOG: logical replication sync for subscription testsub, table a started
> ERROR: could not crate replication slot "testsub_sync_a": ERROR:
> replication slot "testsub_sync_a" already exists
>
>
> and it keeps looping.
> If I then truncate "a" on db2 it doesn't help. (I'd expect at that point
> the initial subscription to work)

Hmm, looks like the error case does not cleanup correctly after itself.

>
> If I then do on db2
> drop subscription testsub cascade;
>
> I still see a slot in use on db1
>
> select * FROM pg_replication_slots ;
> slot_name | plugin | slot_type | datoid | database | active |
> active_pid | xmin | catalog_xmin | rest
> art_lsn | confirmed_flush_lsn
> ----------------+----------+-----------+--------+----------+--------+------------+------+--------------+-----
>
> --------+---------------------
> testsub_sync_a | pgoutput | logical | 16384 | test | f
> | | | 1173 | 0/15
> 66E08 | 0/1566E40
>

Same as above.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2016-09-06 09:56:49 Re: Override compile time log levels of specific messages/modules
Previous Message Stas Kelvich 2016-09-06 09:49:53 Re: Speedup twophase transactions