Re: shared memory message queues

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: shared memory message queues
Date: 2013-12-05 15:26:51
Message-ID: CADyhKSUnSh+TKbFLb4B48wKb8RyepRs_p9QoKQOWo_bd8vOefQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry for my late.

I checked the part-3 (shm-mq-v1.patc) portion as below.
Your comments towards part-1 and part-2 are reasonable for me,
so I have no argue on this portion.

Even though shm_mq_create() expects the "address" is aligned,
however, no mechanism to ensure. How about to put Assert() here?

shm_mq_send_bytes() has an Assert() to check the length to be
added to mq_bytes_written. Is the MAXALIGN64() right check in
32-bit architecture?
The sendnow value might be Min(ringsize - used, nbytes - sent),
and the ringsize came from mq->mq_ring_size being aligned
using MAXALIGN(_DOWN) that has 32bit width.

/*
* Update count of bytes written, with alignment padding. Note
* that this will never actually insert any padding except at the
* end of a run of bytes, because the buffer size is a multiple of
* MAXIMUM_ALIGNOF, and each read is as well.
*/
Assert(sent == nbytes || sendnow == MAXALIGN64(sendnow));
shm_mq_inc_bytes_written(mq, MAXALIGN64(sendnow));

What will happen if sender tries to send a large chunk that needs to
be split into multiple sub-chunks and receiver concurrently detaches
itself from the queue during the writes by sender?
It seems to me the sender gets SHM_MQ_DETACHED and only
earlier half of the chunk still remains on the queue even though
its total length was already in the message queue.
It may eventually lead infinite loop on the receiver side when another
receiver appeared again later, then read incomplete chunk.
Does it a feasible scenario? If so, it might be a solution to prohibit
enqueuing something without receiver, and reset queue when a new
receiver is attached.

The source code comments in shm_mq_wait_internal() is a little bit
misleading for me. It says nothing shall be written without attaching
the peer process, however, we have no checks as long as nsend is
less than queue size.

* If handle != NULL, the queue can be read or written even before the
* other process has attached. We'll wait for it to do so if needed. The
* handle must be for a background worker initialized with bgw_notify_pid
* equal to our PID.

Right now, that's all I can comment on. I'll do follow-up code reading
in the weekend.

Thanks,

2013/11/20 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Tue, Nov 19, 2013 at 12:33 AM, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp> wrote:
>> * on-dsm-detach-v2.patch
>> It reminded me the hook registration/invocation mechanism on apache/httpd.
>> It defines five levels for invocation order (REALLY_FIRST, FIRST, MIDDLE,
>> LAST, REALLY_LAST), but these are alias of integer values, in other words,
>> they are just kicks the hook in order to the priority value associated with a
>> function pointer.
>> These flexibility may make sense. We may want to control the order to
>> release resources more fine grained in the future. For example, isn't it
>> a problem if we have only two levels when a stuff in a queue needs to be
>> released earlier than the queue itself?
>> I'm not 100% certain on this suggestion because on_shmen_exit is a hook
>> that does not host so much callbacks, thus extension may implement
>> their own way on the SHMEM_EXIT_EARLY or SHMEM_EXIT_LATE stage
>> of course.
>
> I don't really see much point in adding more flexibility here until we
> need it, but I can imagine that we might someday need it, for reasons
> that are not now obvious.
>
>> * shm-toc-v1.patch
>>
>> From my experience, it makes sense to put a magic number on the tail of
>> toc segment to detect shared-memory usage overrun. It helps to find bugs
>> bug hard to find because of concurrent jobs.
>> Probably, shm_toc_freespace() is a point to put assertion.
>>
>> How many entries is shm_toc_lookup() assumed to chain?
>> It assumes a liner search from the head of shm_toc segment, and it is
>> prerequisite for lock-less reference, isn't it?
>> Does it make a problem if shm_toc host many entries, like 100 or 1000?
>> Or, it is not an expected usage?
>
> It is not an expected usage. In typical usage, I expect that the
> number of TOC entries will be about N+K, where K is a small constant
> (< 10) and N is the number of cooperating parallel workers. It's
> possible that we'll someday be in a position to leverage 100 or 1000
> parallel workers on the same task, but I don't expect it to be soon.
> And, actually, I doubt that revising the data structure would pay off
> at N=100. At N=1000, maybe. At N=10000, probably. But we are
> *definitely* not going to need that kind of scale any time soon, and I
> don't think it makes sense to design a complex data structure to
> handle that case when there are so many more basic problems that need
> to be solved before we can go there.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-12-05 15:34:16 Re: Performance optimization of btree binary search
Previous Message Robert Haas 2013-12-05 15:25:54 Re: Proposal: variant of regclass