Quick Links

Re: Interval for launching the table sync worker

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Interval for launching the table sync worker
Date:	2017-04-19 12:42:31
Message-ID:	CAD21AoBPk5RR_1vF=W9gOHsgHSGBrQOS6dVQVKxfQXeBNXT+=Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com> wrote in <f64d87d1-bef3-5e3e-a999-ba302816a0ee(at)2ndquadrant(dot)com>
>> On 18/04/17 18:14, Peter Eisentraut wrote:
>> > On 4/18/17 11:59, Petr Jelinek wrote:
>> >> Hmm if we create hashtable for this, I'd say create hashtable for the
>> >> whole table_states then. The reason why it's list now was that it seemed
>> >> unnecessary to have hashtable when it will be empty almost always but
>> >> there is no need to have both hashtable + list IMHO.
>
> I understant that but I also don't like the frequent palloc/pfree
> in long-lasting context and double loop like Peter.
>
>> > The difference is that we blow away the list of states when the catalog
>> > changes, but we keep the hash table with the start times around. We
>> > need two things with different life times.
>
> On the other hand, hash seems overdone. Addition to that, the
> hash-version leaks stale entries while subscriptions are
> modified. But vacuuming them costs high.
>
>> Why can't we just update the hashtable based on the catalog? I mean once
>> the record is not needed in the list, the table has been synced so there
>> is no need for the timestamp either since we'll not try to start the
>> worker again.

I guess the table sync worker for the same table could need to be
started again. For example, please image a case where the table
belonging to the publication is removed from it and the corresponding
subscription is refreshed, and then the table is added to it again. We
have the record of the table with timestamp in the hash table when the
table sync in the first time, but the table sync after refreshed could
have to wait for the interval.

>
> Considering the anticipated complexity when many subscriptions
> exist in the list, and complexity to remove stale entries in the
> hash, I'm inclining toward poroposing just to add 'worker_start'
> in pg_subscription_rel. We already have the similars in
> pg_stat_activity and pg_stat_replication.
>

I was thinking the same. But I'm concerned last start time of table
sync worker is not kind of useful information for the user and we
already have similar value in pg_stat_activity
(pg_stat_replication.backend_start is actually taken from
pg_stat_activity.backend_start). I'm not sure whether it's good idea
to show the slightly shifed timestamps having same meaning in two
system view.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Re: Interval for launching the table sync worker at 2017-04-19 08:12:24 from Kyotaro HORIGUCHI

Responses

Re: Interval for launching the table sync worker at 2017-04-19 13:07:23 from Petr Jelinek

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Ashutosh Bapat	2017-04-19 12:42:57	Re: Fixup some misusage of appendStringInfo and friends
Previous Message	Petr Jelinek	2017-04-19 11:30:29	Re: Logical replication ApplyContext bloat