Re: Re: Alter subscription..SET - NOTICE message is coming for table which is already removed

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Alter subscription..SET - NOTICE message is coming for table which is already removed
Date: 2017-06-08 07:54:50
Message-ID: CAD21AoD4tEutC44wpsY9mpRR_A_cA2kRBDSgazV+6S6YHv5Cuw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 8, 2017 at 5:36 AM, Peter Eisentraut
<peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
> On 5/30/17 13:25, Masahiko Sawada wrote:
>> I think this cause is that the relation status entry could be deleted
>> by ALTER SUBSCRIPTION REFRESH before corresponding table sync worker
>> starting. Attached patch fixes issues reported on this thread so far.
>
> I have committed the part of the patch that changes the
> SetSubscriptionRelState() calls.
>

Thank you!

> I think there was a mistake in your patch, in that the calls in
> LogicalRepSyncTableStart() used true once and false once. I think all
> the calls in tablesync.c should be the same.

Yes, you're right.

> (If you look at the patch again, notice that I have changed the
> insert_ok argument to update_only, so true and false are flipped.)

Okay.

> I'm not convinced about the change to the GetSubscriptionRelState()
> argument. In the examples given, no tables are removed from any
> publications, so I don't see how the claimed situation can happen. I
> would like to see more reproducible examples.

In process_syncing_tables_for_apply(), apply worker gets the list of
all non-ready tables and tries to launch table sync workers
accordingly. But after got the list but before launch workers these
tables can be removed from publication, so launched table sync worker
cannot found its rel state from pg_subscription_rel. It completely
depends on timing and it happens rarely.

The reproduction step is provided by tushar but I could reproduced it
with following step.

X cluster ->
=# select 'create table t' || generate_series(1,100) || '(c
int);';\gexec -- create 100 tables
=# create table t (c int); -- create one more table
=# create publication all_pub for all tables;
=# create publication one_pub for table t;

Y Cluster ->
(create the same 101 tables as well)
=# create subscription hoge_sub connection 'host=localhost port=5432'
publication one_pub;
=# alter subscription hoge_sub set publication all_pub; select
pg_sleep(1); alter subscription hoge_sub set publication one_pub;
*Error occurs here*

> Right now, if the subscription rel state disappears before the sync
> worker starts, the error kills the sync worker, so things should
> continue working correctly. Perhaps the error message isn't the best.
>

The change to GetSubscriptionRelState in that patch solves the error
message problem you mentioned. Returning SUBREL_STATE_UNKNOWN by
GetSubscriptionRelState means that the subscription rel state could
not found at the time. So we can emit the error with appropriate
message.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikitin Nikolay 2017-06-08 08:00:09 Long binded parameter value in the postgres log
Previous Message Neha Khatri 2017-06-08 07:04:09 Re: Does pg_upgrade really support "make installcheck"?