Quick Links

Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Daniel Farina <daniel(at)heroku(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node
Date:	2012-06-20 17:40:05
Message-ID:	201206201940.06189.andres@2ndquadrant.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wednesday, June 20, 2012 07:17:57 PM Robert Haas wrote:
> On Wed, Jun 20, 2012 at 12:53 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:
> > I would prefer the event trigger way because that seems to be the most
> > extensible/reusable. It would allow a fully replicated databases and
> > catalog only instances.
> > I think we need to design event triggers in a way you cannot simply
> > circumvent them. We already have the case that if users try to screw
> > around system triggers we give back wrong answers with the planner
> > relying on foreign keys btw.
> > If the problem is having user trigger after system triggers: Lets make
> > that impossible. Forbidding DDL on the other instances once we have that
> > isn't that hard.
>
> So, this is interesting. I think something like this could solve the
> problem, but then why not just make it built-in code that runs from
> the same place as the event trigger rather than using the trigger
> mechanism per se? Presumably the "trigger" code's purpose is going to
> be to inject additional data into the WAL stream (am I wrong?) which
> is not something you're going to be able to do from PL/pgsql anyway,
> so you don't really need a trigger, just a call to some C function -
> which not only has the advantage of being not bypassable, but is also
> faster.
I would be totally fine with that. As long as event triggers provide the
infrastructure that shouldn't be a big problem.

> > Perhaps all that will get simpler if we can make reading the catalog via
> > custom built snapshots work as you proposed otherwhere in this thread.
> > That would make checking errors way much easier even if you just want to
> > apply to a database with exactly the same schema. Thats the next thing I
> > plan to work on.
> I realized a problem with that idea this morning: it might work for
> reading things, but if anyone attempts to write data you've got big
> problems. Maybe we could get away with forbidding that, not sure.
Hm, why is writing a problem? You mean io conversion routines writing data?
Yes, that will be a problem. I am fine with simply forbidding that, we should
be able to catch that and provide a sensible error message, since SSI we have
the support for that.

> Would be nice to get some input from other hackers on this.
Oh, yes!

> > I agree that the focus isn't 100% optimal and that there are *loads* of
> > issues we haven't event started to look at. But you need a point to
> > start and extraction & apply seems to be a good one because you can
> > actually test it without the other issues solved which is not really the
> > case the other way round.
> > Also its possible to plug in the newly built changeset extraction into
> > existing solutions to make them more efficient while retaining most of
> > their respective framework.
> >
> > So I disagree that thats the wrong part to start with.
>
> I think extraction is a very sensible place to start; actually, I
> think it's the best possible place to start. But this particular
> thread is about adding origin_ids to WAL, which I think is definitely
> not the best place to start.
Yep. I think the reason everyone started at it is that the patch was actually
really simple ;).
Note that the wal enrichement & decoding patches were before the origin_id
patch in the patchseries ;)

> > I definitely do want to provide code that generates a textual
> > representation of the changes. As you say, even if its not used for
> > anything its needed for debugging. Not sure if it should be sql or maybe
> > the new slony representation. If thats provided and reusable it should
> > make sure that ontop of that other solutions can be built.
> Oh, yeah. If we can get that, I will throw a party.
Good ;)

> > I find your supposition that I/we just want to get MMR without regard for
> > anything else a bit offensive. I wrote at least three times in this
> > thread that I do think its likely that we will not get more than the
> > minimal basis for implementing MMR into 9.3. I wrote multiple times that
> > I want to provide the basis for multiple solutions. The prototype -
> > while obviously being incomplete - tried hard to be modular.
> > You cannot blame us that we want the work we do to be *also* usable for
> > what one of our major aims?
> > What can I do to convince you/others that I am not planning to do
> > something "evil" but that I try to reach as many goals at once as
> > possible?
> Sorry. I don't think you're planning to do something evil, but before
> I thought you said you did NOT want to write the code to extract
> changes as text or something similar.
Hm. I might have been a bit ambiguous when saying that I do not want to
provide everything for that use-case.
Once we have a callpoint that has a correct catalog snapshot for exactly the
tuple in question text conversion is damn near trivial. The point where you
get passed all that information (action, tuple, table, snapshot) is the one I
think the patch should mainly provide.

> I think that would be a really
> bad thing to skip for all kinds of reasons. I think we need that as a
> foundational technology before we do much else. Now, once we have
> that, if we can safely detect cases where it's OK to bypass decoding
> to text and skip it in just those cases, I think that's great
> (although possibly difficult to implement correctly). I basically
> feel that without decode-to-text, this can't possibly be a basis for
> multiple solutions; it will be a basis only for itself, and extremely
> difficult to debug, too. No other replication solution can even
> theoretically have any use for the raw on-disk tuple, at least not
> without horrible kludgery.
We need a simple decode to text feature. Agreed.

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node at 2012-06-20 17:17:57 from Robert Haas

Responses

Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node at 2012-06-20 17:50:37 from Robert Haas
Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node at 2012-06-20 17:51:02 from Simon Riggs

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Josh Berkus	2012-06-20 17:40:06	Re: Nasty, propagating POLA violation in COPY CSV HEADER
Previous Message	Peter Geoghegan	2012-06-20 17:38:25	Re: sortsupport for text