Quick Links

Re: PG-MQ?

From:	"Jeroen T(dot) Vermeulen" <jtv(at)xs4all(dot)nl>
To:	"Chris Browne" <cbbrowne(at)acm(dot)org>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: PG-MQ?
Date:	2007-06-20 07:45:57
Message-ID:	7108.125.24.217.75.1182325557.squirrel@webmail.xs4all.nl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, June 20, 2007 04:45, Chris Browne wrote:
> I'm seeing some applications where it appears that there would be
> value in introducing asynchronous messaging, ala "message queueing."
> <http://en.wikipedia.org/wiki/Message_queue>
>
> The "granddaddy" of message queuing systems is IBM's MQ-Series, and I
> don't see particular value in replicating its functionality.

I'm quite interested in this. Maybe I'm thinking of something too
complex, but I do think there are some "oh it'll need to do that too"
pitfalls that are best considered up front.

The big thing about MQ is that it participates as a resource manager in
two-phase commits (and optionally a transaction manager as well). That
means that you get atomic processing steps: application takes message off
a queue, processes it, commits its changes to the database, replies to
message. The queue manager then does a second-phase commit for all of
those steps, and that's when the reply really goes out. If the
application fails, none of this will have happened so you get ACID over
the complete cycle. That's something we should have free software for.

Perhaps the time is right for something new. A lot of the complexity
inside MQ comes from data representation issues like encodings and
fixed-length strings, as I recall, and things have changed since MQ was
designed. I agree it could be useful (and probably not hard either) to
have a transactional messaging system inside the database. It saves you
from having to do two-phase commits.

But it does tie everything to postgres to some extent, and you lose the
interesting featuresatomicity and assured, single deliveryas soon as
anything in the chain does anything persistent that does not participate
in the postgres transaction. Perhaps what we really need is more mature
components, with a unified control layer on top. That's how a lot of
successful free software grows. See below.

> On the other side, the "big names" these days are:
>
> a) The Java Messaging Service, which seems to implement *way* more
> options than I'm even vaguely interested in having (notably, lots
> that involve data stores or lack thereof that I do not care to use);

Far as I know, JMS is an API, not a product. You'd still slot some
messaging middleware underneath, such as MQ. That is why MQSeries was
renamed: it fits into the WebSphere suite as the implementing engine
underneath the JMS API. From what I understand MQ is one of the
"best-of-breed" products that JMS was designed around. (Sun's term, bit
hypey for my taste).

In one way, Java is easy: the last thing you want to get into is yet
another marshaling standard. There are plenty of "standards" to choose
from already, each married to one particular communications mechanism:
RPC, EDI, CORBA, D-Bus, XMLRPC, what have you. Even postgres has its own.
I'd say the most successful mechanism is TCP itself, because it isolates
itself from content representation so effectively.

It's hard not to get into marshaling: someone has to do it, and it's often
a drag to do it in the application, but the way things stand now *any*
choice limits the usefulness of what you're building. That's something
I'd like to see change.

Personally I'd love to see marshaling or low-level data representation
isolated into a mature component that speaks multiple programming
languages on the one hand and multiple data representation formats on the
other. Something the implementers of some of these messaging standards
would want to use to compose their messages, isolating their format
definitions into plugins. Something that would make application writers
stop composing messages in finicky ad-hoc code that fails with unexpected
locales or has trouble with different line breaks.

If we had a component like that, combining it with existing transactional
variants of TCP and [S]HTTP might even be enough to build a usable
messaging system. I haven't looked at them enough to know. Of course
we'd need implementations of those protocols; see
http://ttcplinux.sourceforge.net/ and http://www.csn.ul.ie/~heathclf/fyp/
for example.

Another box of important tools, and I have no idea where we stand with
this one, is transaction management. We have 2-phase commit in postgres
now. But do we have interoperability with existing transaction managers?
Is there a decent free, portable, everything-agnostic transaction manager?
With those, the sphere of reliability of a database-driven messaging
package could extend much further.

A free XA-capable filesystem would be great too, but I guess I'm daydreaming.

> There tend to be varying semantics out there:
>
> - Some queues may represent "subscriptions" where a whole bunch of
> listeners want to get all the messages;

The two simplest models that offer something more than TCP/UDP are 1:n
reliable publish-subscribe without persistence, and 1:1 request-reply with
persistent storage. D-Bus does them both; IIRC MQ does 1:1 and has
add-ons on top for publish-subscribe.

I could imagine variations such as persistent publish-subscribe, where you
can come back once in a while and see if your subscriptions caught
anything since your last visit. But such things probably get more complex
and less useful as you add more ideas.

On top of that goes communication model: symmetric or asymmetric,
synchronous or asynchronous. Do you end up with a "remote procedure call"
model like RPC, D-Bus, CORBA? Or do you stick with a pure message/event
view of communication? Personally I think it's good not to intrude into
the application's event loop too much, but others seem to feel the central
event loop should not be part of application code.

> - Sometimes you have the semantics where:
> - messages need to be delivered at least once
> - messages need to be delivered no more than once
> - messages need to be delivered exactly once

IMHO, if you're not doing "exactly once," or something very close to it,
you might as well stay with ad-hoc code. You can ensure single delivery
by having the sender re-send when in doubt, and keeping track of
duplications in the recipient.

> Is there any existing work out there on this? Or should I maybe be
> looking at prototyping something?

I've looked around a bit (not much) and not found anything very generally
useful. I think it's an exciting area that probably needs work, so
prototyping might be a good idea. If nothing else, I hope I've given you
some examples of what you don't want to get yourself into. :-)

Jeroen

In response to

PG-MQ? at 2007-06-19 21:45:16 from Chris Browne

Responses

Re: PG-MQ? at 2007-06-20 10:58:20 from Marko Kreen

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Markus Schiltknecht	2007-06-20 10:37:42	Re: PG-MQ?
Previous Message	Tom Lane	2007-06-20 06:33:57	Re: Maximum reasonable bgwriter_delay