Re: Logcal replication in large scale

From: Nicklas Avén <nicklas(dot)aven(at)jordogskog(dot)no>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Logcal replication in large scale
Date: 2017-09-20 08:02:40
Message-ID: 1505894560.2239.1.camel@jordogskog.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 2017-09-20 at 09:44 +0200, Magnus Hagander wrote:
> On Wed, Sep 20, 2017 at 8:53 AM, Nicklas Avén <nicklas(dot)aven(at)jordogsko
> g.no> wrote:
> > Hallo all
> >
> >
> >
> > I am thrilled about logical replication in PostgreSQL 10. My head
> > have
> >
> > started spinning about use cases.
> >
> >
> >
> > Would it be possible to use logical replication as a distribution
> >
> > method of data?
>
> As an answer to the generic  question: yes :)
>
>  
> > I think about map data from national mapping authorities. The
> > problem
> >
> > is to get the updates of their data sets. Especially the open data
> > sets
> >
> > are now distributed as files (shape files) in Sweden and as pg_dump
> > in
> >
> > Norway.
> >
> >
> >
> > I guess it is beyond what logical replication is designed for, so I
> > ask
> >
> > what problems that might arise for a scenario like:
> >
> >
> >
> > The distributor has a publication database with logical replication
> >
> > publications of the tables. All users, probably thousands or more,
> >
> > would subscribe to that publication to get an updated copy of the
> > data
> >
> > set.
> >
> >
> >
> > How would the publication server react? I guess the WAL-files will
> > only
> >
> > be written once anyway?
>
> Yes. But they will  need to be kept around until *all* subscribers
> have pulled down their changes. So even one subscriber falling behind
> will mean your WAL will never get cleaned up.
>
> Of course, you can keep some sort of watcher process that kills old
> replication slots.
>
> I am also not sure how well PostgreSQL will react to having thousands
> of replication slots. It's not what the system was designed for I
> believe :)
>
Ok, I have to read me up on how this works. I thought about it as a
bucket of WAL-files that the subscribers just "grab".But of course
there is some bookkeeping to make things work.
I guess there is also mechanisms so a new subscriber can get a complete
table after the publcation have been active for a long time? I mean the
"cleaning up" of Wal-files will not leave new subscribers missing what
is written to the table long ago?

> You might be better of using logical decoding (which goes back to
> 9.4) to stream the data out, but not having each subscriber be a
> postgresql subscriber. Either using it to generate some sort of "diff
> files" that can then be consumed downstream, or by distributing it
> via some kind of dedicated queuing system designed to handle that
> many downstreams.

The thing is that I am not involved in any of those organisations, just
a user that want to give them some good reasons to consider
deistributing through PostgreSQL :-)
>  
> > My guess is that it will be a lower payload than today anyway when
> > the
> >
> > whole data set have to be fetched to get updates.
>
> As always that's going to depend on the number of updates. If the
> same thing is updated 1000 times, then if you use logical replication
> it gets transferred 1000 times. So there are definitely cases when
> there will be *more* traffic with logical replication, but in cases
> like yours I would guess it will be less.
>
> //Magnus
>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Güttler 2017-09-20 08:03:15 VM-Ware Backup of VM safe?
Previous Message Thomas Güttler 2017-09-20 07:59:44 Re: Final pg_dumpall should happen in Single-User-Mode