From: | Petr Jelinek <petr(at)2ndquadrant(dot)com> |
---|---|
To: | Steve Singer <steve(at)ssinger(dot)info>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru> |
Cc: | Craig Ringer <craig(at)2ndquadrant(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Logical Replication WIP |
Date: | 2016-09-06 09:55:18 |
Message-ID: | e48834c5-1db9-b381-3b7e-8f1ecb04dddd@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 05/09/16 23:35, Steve Singer wrote:
> On 09/05/2016 03:58 PM, Steve Singer wrote:
>> On 08/31/2016 04:51 PM, Petr Jelinek wrote:
>>> Hi,
>>>
>>> and one more version with bug fixes, improved code docs and couple
>>> more tests, some general cleanup and also rebased on current master
>>> for the start of CF.
>>>
>>>
>>>
>>
>
> A few more things I noticed when playing with the patches
>
> 1, Creating a subscription to yourself ends pretty badly,
> the 'CREATE SUBSCRIPTION' command seems to get stuck, and you can't kill
> it. The background process seems to be waiting for a transaction to
> commit (I assume the create subscription command). I had to kill -9 the
> various processes to get things to stop. Getting confused about
> hostnames and ports is a common operator error.
>
Hmm I guess there is missing interrupts check, will look. It would be
great to detect it properly but I am not really sure how to do that as
afaik there is no accurate way to detect that the connection is to yourself.
> 2. Failures during the initial subscription aren't recoverable
>
> For example
>
> on db1
> create table a(id serial4 primary key,b text);
> insert into a(b) values ('1');
> create publication testpub for table a;
>
> on db2
> create table a(id serial4 primary key,b text);
> insert into a(b) values ('1');
> create subscription testsub connection 'host=localhost port=5440
> dbname=test' publication testpub;
>
> I then get in my db2 log
>
> ERROR: duplicate key value violates unique constraint "a_pkey"
> DETAIL: Key (id)=(1) already exists.
> LOG: worker process: logical replication worker 16396 sync 16387 (PID
> 10583) exited with exit code 1
> LOG: logical replication sync for subscription testsub, table a started
> ERROR: could not crate replication slot "testsub_sync_a": ERROR:
> replication slot "testsub_sync_a" already exists
>
>
> LOG: worker process: logical replication worker 16396 sync 16387 (PID
> 10585) exited with exit code 1
> LOG: logical replication sync for subscription testsub, table a started
> ERROR: could not crate replication slot "testsub_sync_a": ERROR:
> replication slot "testsub_sync_a" already exists
>
>
> and it keeps looping.
> If I then truncate "a" on db2 it doesn't help. (I'd expect at that point
> the initial subscription to work)
Hmm, looks like the error case does not cleanup correctly after itself.
>
> If I then do on db2
> drop subscription testsub cascade;
>
> I still see a slot in use on db1
>
> select * FROM pg_replication_slots ;
> slot_name | plugin | slot_type | datoid | database | active |
> active_pid | xmin | catalog_xmin | rest
> art_lsn | confirmed_flush_lsn
> ----------------+----------+-----------+--------+----------+--------+------------+------+--------------+-----
>
> --------+---------------------
> testsub_sync_a | pgoutput | logical | 16384 | test | f
> | | | 1173 | 0/15
> 66E08 | 0/1566E40
>
Same as above.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Pavan Deolasee | 2016-09-06 09:56:49 | Re: Override compile time log levels of specific messages/modules |
Previous Message | Stas Kelvich | 2016-09-06 09:49:53 | Re: Speedup twophase transactions |