Re: Dynamic Shared Memory stuff

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Dynamic Shared Memory stuff
Date: 2013-12-05 19:34:24
Message-ID: CA+TgmoayUzQ6Kjs5osEV+JNpVvK=b3mDg=dLDeiTeFJ+97BNRA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 5, 2013 at 11:12 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> Hmm. Those two use cases are quite different. For message-passing, you want
> a lot of small queues, but for parallel sort, you want one huge allocation.
> I wonder if we shouldn't even try a one-size-fits-all solution.
>
> For message-passing, there isn't much need to even use dynamic shared
> memory. You could just assign one fixed-sized, single-reader multiple-writer
> queue for each backend.

True, although if the queue needs to 1MB, or even 128kB, that would
bloat the static shared-memory footprint over the server pretty
significantly. And I don't know that we know that a small queue will
be adequate in all cases. If you've got a worker backend feeding data
back to the user backend, the size of the queue limits how far ahead
of the user backend that worker can get. Big is good, because then
the user backend won't stall on read, but small is also good, in case
the query is cancelled or hits an error. It is far from obvious to me
that one-size-fits-all is the right solution.

> For parallel sort, you'll want to utilize all the available memory and all
> CPUs for one huge sort. So all you really need is a single huge shared
> memory segment. If one process is already using that 512GB segment to do a
> sort, you do *not* want to allocate a second 512GB segment. You'll want to
> wait for the first operation to finish first. Or maybe you'll want to have
> 3-4 somewhat smaller segments in use at the same time, but not more than
> that.

This is all true, but it has basically nothing to do with parallelism.
work_mem is a poor model, but I didn't invent it. Hopefully some day
someone will fix it, maybe even me, but that's a separate project.

> I really think we need to do something about it. To use your earlier example
> of parallel sort, it's not acceptable to permanently leak a 512 GB segment
> on a system with 1 TB of RAM.
>
> One idea is to create the shared memory object with shm_open, and wait until
> all the worker processes that need it have attached to it. Then,
> shm_unlink() it, before using it for anything. That way the segment will be
> automatically released once all the processes close() it, or die. In
> particular, kill -9 will release it. (This is a variant of my earlier idea
> to create a small number of anonymous shared memory file descriptors in
> postmaster startup with shm_open(), and pass them down to child processes
> with fork()). I think you could use that approach with SysV shared memory as
> well, by destroying the segment with sgmget(IPC_RMID) immediately after all
> processes have attached to it.

That's a very interesting idea. I've been thinking that we needed to
preserve the property that new workers could attach to the shared
memory segment at any time, but that might not be necessary in all
case. We could introduce a new dsm operation that means "i promise no
one else needs to attach to this segment". Further attachments would
be disallowed by dsm.c regardless of the implementation in use, and
dsm_impl.c would also be given a chance to perform
implementation-specific operations, like shm_unlink and
shmctl(IPC_RMID). This new operation, when used, would help to reduce
the chance of leaks and perhaps catch other programming errors as
well.

What should we call it? dsm_finalize() is the first thing that comes
to mind, but I'm not sure I like that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-12-05 19:38:38 Re: Proof of concept: standalone backend with full FE/BE protocol
Previous Message Robert Haas 2013-12-05 19:07:39 Re: shared memory message queues