From: | Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Detecting schema changes during logical replication |
Date: | 2017-05-07 21:54:26 |
Message-ID: | CA+mi_8bJ_uPr67j-6mbin537DVvfk=bOhmWneyBRfbZu89q0tw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, May 7, 2017 at 8:04 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> On 2017-05-07 19:27:08 +0100, Daniele Varrazzo wrote:
>> I'm putting together a replication system based on logical
>> replication.
>
> Interesting. If you very briefly could recap what it's about... ;)
I need to replicate some tables from a central database into the
database that should run a secondary system. For a similar use case we
have used Londiste in the past, which has served us good, but its
usage has not been problem-free. Logical decoding seems much less
invasive on the source database than a trigger-based replication
solution, and has less moving part to care about and maintain.
For the moment I'm hacking into a fork of Euler project for wal
decoding into json (https://github.com/dvarrazzo/wal2json) mostly
adding configurability, so that we may be able to replicate only the
tables we need, skip certain fields etc. I'm also taking a look at
minimising the amount of information produced: sending over and over
the column names and types for every record seems a waste, hence my
question.
>> I would like to send table information only the first
>> time a table is seen by the 'change_cb' callback, but of course there
>> could be some schema change after replication started. So I wonder: is
>> there any information I can find in the 'Relation' structure of the
>> change callback, which may suggest that there could have been a change
>> in the table schema, hence a new schema should be sent to the client?
>
> The best way I can think of - which is also what is implemented in the
> in-core replication framework - is to have a small cache on-top of the
> relcache. That cache is kept coherent using
> CacheRegisterRelcacheCallback(). Then whenever there's a change you
> look up that change in that cache, and send the schema information if
> it's been invalidated since you last sent something. That's also how
> the new stuff in v10 essentially works:
> src/backend/replication/pgoutput/pgoutput.c
>
> pgoutput_change(), does a lookup for its own metadata using get_rel_sync_entry()
> which then checks relentry->schema_sent. Invalidation unsets
> schema_sent in rel_sync_cache_relation_cb.
Thank you very much, it seems exactly what I need. I'll try hacking
around this callback.
-- Daniele
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Janes | 2017-05-07 23:17:31 | logical replication deranged sender |
Previous Message | Fabien COELHO | 2017-05-07 20:55:44 | Re: proposal psql \gdesc |