| From: | Nicklas Avén <nicklas(dot)aven(at)jordogskog(dot)no> |
|---|---|
| To: | Magnus Hagander <magnus(at)hagander(dot)net> |
| Cc: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
| Subject: | Re: Logcal replication in large scale |
| Date: | 2017-09-20 08:02:40 |
| Message-ID: | 1505894560.2239.1.camel@jordogskog.no |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
On Wed, 2017-09-20 at 09:44 +0200, Magnus Hagander wrote:
> On Wed, Sep 20, 2017 at 8:53 AM, Nicklas Avén <nicklas(dot)aven(at)jordogsko
> g.no> wrote:
> > Hallo all
> >
> >
> >
> > I am thrilled about logical replication in PostgreSQL 10. My head
> > have
> >
> > started spinning about use cases.
> >
> >
> >
> > Would it be possible to use logical replication as a distribution
> >
> > method of data?
>
> As an answer to the generic question: yes :)
>
>
> > I think about map data from national mapping authorities. The
> > problem
> >
> > is to get the updates of their data sets. Especially the open data
> > sets
> >
> > are now distributed as files (shape files) in Sweden and as pg_dump
> > in
> >
> > Norway.
> >
> >
> >
> > I guess it is beyond what logical replication is designed for, so I
> > ask
> >
> > what problems that might arise for a scenario like:
> >
> >
> >
> > The distributor has a publication database with logical replication
> >
> > publications of the tables. All users, probably thousands or more,
> >
> > would subscribe to that publication to get an updated copy of the
> > data
> >
> > set.
> >
> >
> >
> > How would the publication server react? I guess the WAL-files will
> > only
> >
> > be written once anyway?
>
> Yes. But they will need to be kept around until *all* subscribers
> have pulled down their changes. So even one subscriber falling behind
> will mean your WAL will never get cleaned up.
>
> Of course, you can keep some sort of watcher process that kills old
> replication slots.
>
> I am also not sure how well PostgreSQL will react to having thousands
> of replication slots. It's not what the system was designed for I
> believe :)
>
Ok, I have to read me up on how this works. I thought about it as a
bucket of WAL-files that the subscribers just "grab".But of course
there is some bookkeeping to make things work.
I guess there is also mechanisms so a new subscriber can get a complete
table after the publcation have been active for a long time? I mean the
"cleaning up" of Wal-files will not leave new subscribers missing what
is written to the table long ago?
> You might be better of using logical decoding (which goes back to
> 9.4) to stream the data out, but not having each subscriber be a
> postgresql subscriber. Either using it to generate some sort of "diff
> files" that can then be consumed downstream, or by distributing it
> via some kind of dedicated queuing system designed to handle that
> many downstreams.
The thing is that I am not involved in any of those organisations, just
a user that want to give them some good reasons to consider
deistributing through PostgreSQL :-)
>
> > My guess is that it will be a lower payload than today anyway when
> > the
> >
> > whole data set have to be fetched to get updates.
>
> As always that's going to depend on the number of updates. If the
> same thing is updated 1000 times, then if you use logical replication
> it gets transferred 1000 times. So there are definitely cases when
> there will be *more* traffic with logical replication, but in cases
> like yours I would guess it will be less.
>
> //Magnus
>
>
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Thomas Güttler | 2017-09-20 08:03:15 | VM-Ware Backup of VM safe? |
| Previous Message | Thomas Güttler | 2017-09-20 07:59:44 | Re: Final pg_dumpall should happen in Single-User-Mode |