Re: shared memory message queues

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: shared memory message queues
Date: 2013-12-07 20:02:11
Message-ID: CA+TgmobKbDKxtDkjxOrNeY=Tw3wDuitABBLc_5Y_EWTWDsmQzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 6, 2013 at 8:12 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-12-05 14:07:39 -0500, Robert Haas wrote:
>> On Thu, Dec 5, 2013 at 8:29 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> > Hm. The API change of on_shmem_exit() is going to cause a fair bit of
>> > pain. There are a fair number of extensions out there using it and all
>> > would need to be littered by version dependent #ifdefs. What about
>> > providing a version of on_shmem_exit() that allows to specify the exit
>> > phase, and make on_shmem_exit() a backward compatible wrapper?
>>
>> Or, we could have on_user_exit() or so and leave on_shmem_exit() alone
>> altogether, which would probably be transparent for nearly everyone.
>> I kind of like that better, except that I'm not sure whether
>> on_user_exit() is any good as a name.
>
> Leaving it as separate calls sounds good to me as well - but like you I
> don't like on_user_exit() that much. Not sure if I can up with something
> really good...
> on_shmem_exit_phase() or at_shmem_exit_phase() if we go for a function
> allowing to specify phases, and just before_shmem_exit() for the "user
> level" things?

Hmm, before_shmem_exit() and on_shmem_exit() doesn't sound bad.

>> > Hm. A couple of questions/comments:
>> > * How do you imagine keys to be used/built?
>> > * Won't the sequential search over toc entries become problematic?
>> > * Some high level overview of how it's supposed to be used wouldn't
>> > hurt.
>> > * the estimator stuff doesn't seem to belong in the public header?
>>
>> The test-shm-mq patch is intended to illustrate these points. In that
>> case, for an N-worker configuration, there are N+1 keys; that is, N
>> message queues and one fixed-size control area. The estimator stuff
>> is critically needed in the public header so that you can size your
>> DSM to hold the stuff you intend to store in it; again, see
>> test-shm-mq.
>
> I still think shm_mq.c needs to explain more of that. That's where I
> look for a brief high-level explanation, no?

That stuff mostly pertains to shm_toc.c, not shm_mq.c. I can
certainly look at expanding the explanation in the former.

> I had a very quick look at shm_mq_receive()/send() and
> shm_mq_receive_bytes()/shm_mq_send_bytes() - while the low level steps
> seem to be explained fairly well, the high level design of the queue
> doesn't seem to be explained much. I was first thinking that you were
> trying to implement a lockless queue (which is quite efficiently
> possible for 1:1 queues) even because I didn't see any locking in them
> till I noticed it's just delegated to helper functions.

I did initially implement it as lockless, but I figured I needed a
fallback with spinlocks for machines that didn't have atomic 8-bit
writes and/or memory barriers, both of which are required for this to
work without locks. When I implemented the fallback, I discovered
that it wasn't any slower than the lock-free implementation, so I
decided to simplify my life (and the code) by not bothering with it.

So the basic idea is that you have to take the spinlock to modify the
count of bytes read - if you're the reader - or written - if you're
the writer. You also need to take the spinlock to read the counter
the other process might be modifying, but you can read the counter
that only you can modify without the lock. You also don't need the
lock to read or write the data in the buffer, because you can only
write to the portion of the buffer that the reader isn't currently
allowed to read, and you can only read from the portion of the buffer
that the writer isn't currently allowed to write.

> I wonder if we couldn't just generally store a "generation" number for
> each PGPROC which is increased everytime the slot is getting
> reused. Then one could simply check whether mq->mq_sender->generation ==
> mq->mq_sender_generation.

I think you'd want to bump the generation every time the slot was
released rather than when it was reacquired, but it does sound
possibly sensible. However, it wouldn't obviate the separate need to
watch a BackgroundWorkerHandle, because what this lets you do is (say)
write to a queue after you've registered the reader but before it's
actually been started by the postmaster, let alone acquired a PGPROC.
I won't object if someone implements such a system for their needs,
but it's not on my critical path.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2013-12-07 20:07:37 Re: Index overhead cost reporting
Previous Message Robert Haas 2013-12-07 19:48:49 Re: Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan