Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org, Daniel Farina <daniel(at)heroku(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node
Date: 2012-06-20 16:53:01
Message-ID: 201206201853.02434.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Wednesday, June 20, 2012 05:44:09 PM Robert Haas wrote:
> On Wed, Jun 20, 2012 at 10:02 AM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:
> > Were not the only ones here that are performing scope creep though... I
> > think about all people who have posted in the whole thread except maybe
> > Tom and Marko are guilty of doing so.
> >
> > I still think its rather sensible to focus on exactly duplicated schemas
> > in a very first version just because that leaves out some of the
> > complexity while paving the road for other nice things.
>
> Well, I guess what I want to know is: what does focusing on exactly
> duplicated schemas mean? If it means we'll disable DDL for tables
> when we turn on replication, that's basically the Slony approach: when
> you want to make a DDL change, you have to quiesce replication, do it,
> and then resume replication. I would possibly be OK with that
> approach. If it means that we'll hope that the schemas are duplicated
> and start spewing garbage data when they're not, then I'm not
> definitely not OK with that approach. If it means using event
> triggers to keep the catalogs synchronized, then I don't think I don't
> think that's adequately robust. The user could add more event
> triggers that run before or after the ones the replication system
> adds, and then you are back to garbage decoding (or crashes).
I would prefer the event trigger way because that seems to be the most
extensible/reusable. It would allow a fully replicated databases and catalog
only instances.
I think we need to design event triggers in a way you cannot simply circumvent
them. We already have the case that if users try to screw around system
triggers we give back wrong answers with the planner relying on foreign keys
btw.
If the problem is having user trigger after system triggers: Lets make that
impossible. Forbidding DDL on the other instances once we have that isn't that
hard.

Perhaps all that will get simpler if we can make reading the catalog via
custom built snapshots work as you proposed otherwhere in this thread. That
would make checking errors way much easier even if you just want to apply to
a database with exactly the same schema. Thats the next thing I plan to work
on.

> They could also modify the catalogs directly, although it's possible we
> don't care quite as much about that case (but on the other hand people
> do sometimes need to do it to solve real problems).
With that you already can crash the database perfectly fine today. I think
trying to care for that is a waste of time.

> Although I am
> 100% OK with pairing back the initial feature set - indeed, I strongly
> recommend it - I think that robustness is not a feature which can be
> left out in v1 and added in later. All the robustness has to be
> designed in at the start, or we will never have it.
I definitely don't intend to cut down on robustness.

> On the whole, I think we're spending far too much time talking about
> code and far too little time talking about what the overall design
> should look like.
Agreed.

> We are having a discussion about whether or not MMR
> should be supported by sticking a 16-bit node ID into every WAL record
> without having first decided whether we should support MMR, whether
> that requires node IDs, whether they should be integers, whether those
> integers should be 16 bits in size, whether they should be present in
> WAL, and whether or not the record header is the right place to put
> them. There's a right order in which to resolve those questions, and
> this isn't it. More generally, I think there is a ton of complexity
> that we're probably overlooking here in focusing in on specific coding
> details. I think the most interesting comment made to date is Steve
> Singer's observation that very little of Slony is concerned with
> changeset extraction or apply. Now, on the flip side, all of these
> patches seem to be concerned with changeset extraction and apply.
> That suggests that we're missing some pretty significant pieces
> somewhere in this design. I think those pieces are things like error
> recovery, fault tolerance, user interface design, and control logic.
> Slony has spent years trying to get those things right. Whether or
> not they actually have gotten them right is of course an arguable
> point, but we're unlikely to do better by ignoring all of those issues
> and implementing whatever is most technically expedient.
I agree that the focus isn't 100% optimal and that there are *loads* of issues
we haven't event started to look at. But you need a point to start and
extraction & apply seems to be a good one because you can actually test it
without the other issues solved which is not really the case the other way
round.
Also its possible to plug in the newly built changeset extraction into
existing solutions to make them more efficient while retaining most of their
respective framework.

So I disagree that thats the wrong part to start with.

> >> You've got four people objecting to this patch now, all of whom happen
> >> to be committers. Whether or not MMR goes into core, who knows, but
> >> it doesn't seem that this patch is going to fly.
> >
> > I find that a bit too early to say. Sure it won't fly exactly as
> > proposed, but hell, who cares? What I want to get in is a solution to
> > the specific problem the patch targets. At least you have, not sure
> > about others, accepted that the problem needs a solution.
> > We do not agree yet how that solution looks should like but thats not
> > exactly surprising as we started discussing the problem only a good day
> > ago.
> Oh, no argument with any of that. I strongly object to the idea of
> shoving this patch through as-is, but I don't object to solving the
> problem in some other, more appropriate way. I think that won't look
> much like this patch, though; it will be some new patch.
No problem with that.

> > If people agree that your proposed way of just one flag bit is the way to
> > go we will have to live with that. But thats different from saying the
> > whole thing is dead.
> I think you've convinced me that a single flag-bit is not enough, but
> I don't think you've convinced anyone that it belongs in the record
> header.
Not totally happy but also ok with it. As I just wrote to Kevin that just
makes things harder because you need to reassemble transactions before
filtering which is a shame in my opinion.

> >> As I would rather see this project
> >> succeed, I recommend that you don't do that. Both you and Andres seem
> >> to believe that MMR is a reasonable first target to shoot at, but I
> >> don't think anyone else - including the Slony developers who have
> >> commented on this issue - endorses that position.
> >
> > I don't think we get full MMR into 9.3. What I am proposing is that we
> > build in the few pieces that are required to implement MMR *ontop* of
> > whats hopefully in 9.3.
> > And I think thats a realistic goal.
>
> I can't quite follow that sentence, but my general sense is that,
> while you're saying that this infrastructure will be reusable by other
> projects, you don't actually intend to expose APIs that they can use.
> IOW, we'll give you an apply cache - which we believe to be necessary
> to extra tuples as text - but we're leaving the exercise of actually
> generating those tuples as text as an exercise for the reader. I find
> that a highly undesirable plan. First, if we don't actually have the
> infrastructure to extract tuples as text, then the contention that the
> infrastructure is adequate for that purpose can't be proven or
> disproven. Second, until someone from one of those other projects (or
> elsewhere in the community) actually goes and builds it, the built-in
> logical replication will be the only thing that can get benefit out of
> the new infrastructure. I think it's completely unacceptable to give
> an unproven built-in logical replication technology that kind of pride
> of place out of the gate. That potentially allows it to supplant
> systems such as Slony and Bucardo even if it is in many respects
> inferior, just because it's been given an inside track. They have
> lived without core support for years, and if we're going to start
> adding core support for replication, we ought to start by adding the
> things that they think, on the basis of their knowledge and
> experience, are the most important places where core support is
> needed, not going off in a completely new and untested direction.
> Third, when logical replication fails, which it will, because even
> simple things fail and replication is complicated, how am I going to
> debug it? A raw dump of the tuple data that's being shipped around?
> No thanks.
> IOW, what I see you proposing is, basically, let's short-cut the hard
> problems so we can get to MMR faster. I oppose that. That's not the
> Postgres way of building features. We start slow and incremental and
> we make each thing as solid as we possibly can before going on to the
> next thing.
No, I am not saying that I just want to provide some untested base modules and
leave it at that. I am saying that I don't think providing a full fledged
framework for implementing basically arbitrary replication solutions from the
get-go is a sane goal for something that should be finished someday. Even less
so if that implementation is something that will be discussed on -hackers and
needs people to aggree.
I definitely do want to provide code that generates a textual representation
of the changes. As you say, even if its not used for anything its needed for
debugging. Not sure if it should be sql or maybe the new slony representation.
If thats provided and reusable it should make sure that ontop of that other
solutions can be built.

I find your supposition that I/we just want to get MMR without regard for
anything else a bit offensive. I wrote at least three times in this thread
that I do think its likely that we will not get more than the minimal basis
for implementing MMR into 9.3. I wrote multiple times that I want to provide
the basis for multiple solutions. The prototype - while obviously being
incomplete - tried hard to be modular.
You cannot blame us that we want the work we do to be *also* usable for what
one of our major aims?
What can I do to convince you/others that I am not planning to do something
"evil" but that I try to reach as many goals at once as possible?

Greetings,

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2012-06-20 16:54:23 Re: Nasty, propagating POLA violation in COPY CSV HEADER
Previous Message Alvaro Herrera 2012-06-20 16:50:46 Re: Nasty, propagating POLA violation in COPY CSV HEADER