Re: Built-in plugin for logical decoding output

From: Gregory Brail <gregbrail(at)google(dot)com>
To: Alvaro Hernandez <aht(at)ongres(dot)com>
Cc: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Euler Taveira <euler(at)timbira(dot)com(dot)br>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Built-in plugin for logical decoding output
Date: 2017-09-25 16:59:40
Message-ID: CAFF4x12pGTq3NFfEn5t9afwJ=ir_8TAhE+LnKUsObh7aW3SdEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm encouraged that pgoutput exists and I'm sorry that I missed it before.
I think it's fine as a binary-only format. If someone can write a client
for the Postgres wire protocol as documented in Chapter 52 of the docs,
then they should have no trouble consuming the output from pgoutput.

However, I can't find any docs for the output format of pgoutput, which is
going to make it less likely for people to be able to consume it. Is anyone
working on docs? I know that it's a painful process.

I also think that a JSON-format (or configurable format) plugin would make
this part of PG much more usable and I'd encourage the community to come up
with one.

Finally, since there were some "why didn't you just" questions in the email
thread, let me write a little bit about what we were trying to do.

We have a set of data that represents the configuration of some of our
customer's systems. (This is for Apigee Edge, which is a software product
that represents a small part of Google Cloud, and which was developed long
before we joined Google.) We'd like to efficiently and reliably push
configuration changes down to our customer's systems, mostly to make it
possible for them to run parts of our software stack in their own data
centers, with limited or even unreliable network connectivity to the rest
of our services. Data replication is a great fit for this problem.

However, we want the downstream software components (the ones that our
customers run in their own data centers) to know when various things
change, we want those changes delivered in a consistent order, and we want
to be able to reliably receive them by having each consumer keep track of
where they currently are in the replication scheme. Logical replication is
a great fit for this because it enables us to build a list of all the
changes to this management data in a consistent order. Once we have that
list, it's fairly simple to persist it somewhere and let clients consume it
in various ways. (In our case, via an HTTP API that supports long polling.
Having all the clients consume a Kafka stream was not an option that we
wanted to consider.)

The difference between what we're trying to do and most solutions that use
logical replication is that we will have thousands or tens of thousands of
clients pulling a list of changes that originated in a single Postgres
database. That means that we need to index our own copy of the replication
output so that clients can efficiently get changes only to "their" data.
Furthermore, it means that we can't do things like create a unique
replication slot for each client. Instead, we have a smaller number of
servers that replicate from the master, and then those in turn give out
lists of changes to other clients.

On Mon, Sep 25, 2017 at 9:48 AM, Alvaro Hernandez <aht(at)ongres(dot)com> wrote:

>
>
> On 25/09/17 19:39, Petr Jelinek wrote:
>
>>
>> Well, test_decoding is not meant for production use anyway, no need for
>> middleware to support it. The pgoutput is primarily used for internal
>> replication purposes, which is why we need something with more
>> interoperability in mind in the first place. The new plugin should still
>> support publications etc though IMHO.
>>
>> However, having said that, and while json is a great output format
>>> for interoperability, if there's a discussion on which plugin to include
>>> next, I'd also favor one that has some more compact representation
>>> format (or that supports several formats, not only json).
>>>
>>> JSON is indeed great for interoperability, if you want more compact
>> format, use either pgoutput or write something of your own or do
>> conversion to something else in your consumer. I don't think postgres
>> needs to provide 100 different formats out of the box when there is an
>> API. The JSON output does not have to be extremely chatty either btw.
>>
>>
> In my opinion, logical decoding plugins that don't come with core are
> close to worthless (don't get me wrong):
>
> - They very unlikely will be installed in managed environments (an area
> growing significantly).
> - As anything that is not in core, raises concerns by users.
> - Distribution and testing are non-trivial: many OS/archs combinations.
>
> Given the above, I believe having a general-purpose output plugin
> in-core is critical to the use of logical decoding. As for 9.4-9.6 there is
> test_decoding, and given that AWS uses it for production, that's kind of
> fine. For 10 there is at least pgoutput, which could be used (even though
> it was meant for replication). But if a new plugin is to be developed for
> 11+, one really general purpose one, I'd say json is not a good choice if
> it is the only output it would support. json is too verbose, and
> replication, if anything, needs performance (it is both network heavy and
> serialization/deserialization is quite expensive). Why not, if one and only
> one plugin would be developed for 11+, general purpose, do something that
> is, indeed, more general, i.e., that supports high-performance scenarios
> too?
>
>
>
> Álvaro
>
> --
>
> Alvaro Hernandez
>
>
> -----------
> OnGres
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2017-09-25 17:07:48 Re: Built-in plugin for logical decoding output
Previous Message Andrew Dunstan 2017-09-25 16:56:00 Re: Built-in plugin for logical decoding output