Re: proposal: multiple read-write masters in a cluster with wal-streaming synchronization

From: Mark Dilger <markdilger(at)yahoo(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: multiple read-write masters in a cluster with wal-streaming synchronization
Date: 2014-01-02 19:35:57
Message-ID: 1388691357.90569.YahooMailNeo@web125402.mail.ne1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks to both of you for all the feedback.  Your reasoning
about why it is not worth implementing, what the problems
with it would be, etc., are helpful.

Sorry about using the word multimaster where it might
have been better to say sharded.

BTW, since the space shuttle has already left orbit, as you
metaphorically put it, maybe there should be more
visibility to the wider world about this?  You can go to
postgresql.org and find diddly squat about it.  I grant you
that it is not a completed project yet, and so maybe you
want to wait before making major announcements, but
the sort of people who would use this feature are probably
the sort of people who would not mind hearing about it
early.

mark

On Thursday, January 2, 2014 11:18 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:

On 2014-01-02 10:18:52 -0800, Mark Dilger wrote:
> I anticipated that my proposal would require partitioning the catalogs.
> For instance, autovacuum could only run on locally owned tables, and
> would need to store the analyze stats data in a catalog partition belonging
> to the local server, but that doesn't seem like a fundamental barrier to
> it working.

It would make every catalog lookup noticeably more expensive.

>  The partitioned catalog tables would get replicated like
> everything else.  The code that needs to open catalogs and look things
> up could open the specific catalog partition needed if it already knew the
> Oid of the table/index/whatever that it was interested in, as the catalog
> partition desired would have the same modulus as the Oid of the object
> being researched. 

Far, far, far from every lookup is by oid. Most prominently the names of
database objects. Those will have to scan every catalog partition. Not
fun.

> Your point about increasing the runtime of pg_upgrade is taken.  I will
> need to think about that some more.

It's not about increasing the runtime, it's about simply breaking
it. pg_upgrade relies on binary compatibility of user relation's files
and you're breaking that if you change the width of datatypes.

> Your claim that what I describe is not multi-master is at least partially
> correct, depending on how you think about the word "master".  Certainly
> every server is the master of its own chunk.

Well, you're essentially just describing a sharded system - that's not
usually coined multimaster.

> Your claim that BDR doesn't have to be much slower than what I am
> proposing is quite interesting, as if that is true I can ditch this idea and
> use BDR instead.  It is hard to empirically test, though, as I don't have
> the alternate implementation on hand.

Well, I can tell you that for the changeset extraction stuff (which is the
basis for BDR) the biggest bottleneck so far seems to be the CRC
computation when reading the WAL - and that's something plain WAL apply
has to do as well. And it is optimizable.
When actually testing decoding & apply, for workloads fitting into
memory I had to try very hard to construe situations where apply was a
big bottleneck. It is easier for seek bound workloads, where the standby
is less powerful than the primary, since there's more random reads for
those due to full page writes removing the need for reads in many cases.

> I think the expectation that performance will be harmed if postgres
> uses 8 byte Oids is not quite correct.
>
> Several years ago I ported postgresql sources to use 64bit everything.
> Oids, varlena headers, variables tracking offsets, etc.  It was a fair
> amount of work, but all the doom and gloom predictions that I have
> heard over the years about how 8-byte varlena headers would kill
> performance, 8-byte Oids would kill performance, etc, turned out to
> be quite inaccurate.

Well, it can increase the size of the database, turning a system where
the hot set fits into memory into one where it doesn't anymore. But
really, the performance concerns were more about the catalog lookups.

Fundamentally, I think there's nothing I see preventing such a scheme
from being implemented - but I think there's about zap chance of it ever
getting integrated, it's just far to invasive with very high costs in
scenarios where it's not used for not all that much gain. Not to speak
about the amount of engineering it would require to implement.

Greetings,

Andres Freund

--
Andres Freund                      http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-01-02 19:38:45 Re: preserving forensic information when we freeze
Previous Message Tom Lane 2014-01-02 19:32:37 Re: ERROR: missing chunk number 0 for toast value