From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: dynamic shared memory |
Date: | 2013-08-27 14:07:33 |
Message-ID: | 20130827140733.GD24807@alap2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Robert,
[just sending an email which sat in my outbox for two weeks]
On 2013-08-13 21:09:06 -0400, Robert Haas wrote:
> ...
Nice to see this coming. I think it will actually be interesting for
quite some things outside parallel query, but we'll see.
I've not yet looked at the code, so I just have some highlevel comments
so far.
> To help solve these problems, I invented something called the "dynamic
> shared memory control segment". This is a dynamic shared memory
> segment created at startup (or reinitialization) time by the
> postmaster before any user process are created. It is used to store a
> list of the identities of all the other dynamic shared memory segments
> we have outstanding and the reference count of each. If the
> postmaster goes through a crash-and-reset cycle, it scans the control
> segment and removes all the other segments mentioned there, and then
> recreates the control segment itself. If the postmaster is killed off
> (e.g. kill -9) and restarted, it locates the old control segment and
> proceeds similarly.
That way any corruption in that area will prevent restarts without
reboot unless you use ipcrm, or such, right?
> Creating a shared memory segment is a somewhat operating-system
> dependent task. I decided that it would be smart to support several
> different implementations and to let the user choose which one they'd
> like to use via a new GUC, dynamic_shared_memory_type.
I think we want that during development, but I'd rather not go there
when releasing. After all, we don't support a manual choice between
anonymous mmap/sysv shmem either.
> In addition, I've included an implementation based on mmap of a plain
> file. As compared with a true shared memory implementation, this
> obviously has the disadvantage that the OS may be more likely to
> decide to write back dirty pages to disk, which could hurt
> performance. However, I believe it's worthy of inclusion all the
> same, because there are a variety of situations in which it might be
> more convenient than one of the other implementations. One is
> debugging.
Hm. Not sure what's the advantage over a corefile here.
> On MacOS X, for example, there seems to be no way to list
> POSIX shared memory segments, and no easy way to inspect the contents
> of either POSIX or System V shared memory segments.
Shouldn't we ourselves know which segments are around?
> Another use case
> is working around an administrator-imposed or OS-imposed shared memory
> limit. If you're not allowed to allocate shared memory, but you are
> allowed to create files, then this implementation will let you use
> whatever facilities we build on top of dynamic shared memory anyway.
I don't think we should try to work around limits like that.
> A third possible reason to use this implementation is
> compartmentalization. For example, you can put the directory that
> stores the dynamic shared memory segments on a RAM disk - which
> removes the performance concern - and then do whatever you like with
> that directory: secure it, put filesystem quotas on it, or sprinkle
> magic pixie dust on it. It doesn't even seem out of the question that
> there might be cases where there are multiple RAM disks present with
> different performance characteristics (e.g. on NUMA machines) and this
> would provide fine-grained control over where your shared memory
> segments get placed. To make a long story short, I won't be crushed
> if the consensus is against including this, but I think it's useful.
-1 so far. Seems a bit handwavy to me.
> Other implementations are imaginable but not implemented here. For
> example, you can imagine using the mmap() of an anonymous file.
> However, since the point is that these segments are created on the fly
> by individual backends and then shared with other backends, that gets
> a little tricky. In order for the second backend to map the same
> anonymous shared memory segment that the first one mapped, you'd have
> to pass the file descriptor from one process to the other.
It wouldn't even work. Several mappings of /dev/zero et al. do *not*
result in the same virtual memory being mapped. Not even when using the
same (passed around) fd.
Believe me, I tried ;)
> There are quite a few problems that this patch does not solve. First,
> while it does give you a shared memory segment, it doesn't provide you
> with any help at all in figuring out what to put in that segment. The
> task of figuring out how to communicate usefully through shared memory
> is thus, for the moment, left entirely to the application programmer.
> While there may be cases where that's just right, I suspect there will
> be a wider range of cases where it isn't, and I plan to work on some
> additional facilities, sitting on top of this basic structure, next,
> though probably as a separate patch.
Agreed.
> Second, it doesn't make any> policy decisions about what is sensible either in terms of number of
> shared memory segments or the sizes of those segments, even though
> there are serious practical limits in both cases. Actually, the total
> number of segments system-wide is limited by the size of the control
> segment, which is sized based on MaxBackends. But there's nothing to
> keep a single backend from eating up all the slots, even though that's
> pretty both unfriendly and unportable, and there's no real limit to
> the amount of memory it can gobble up per slot, either. In other
> words, it would be a bad idea to write a contrib module that exposes a
> relatively uncooked version of this layer to the user.
At this point I am rather unconcerned with this point to be
honest.
> --- /dev/null
> +++ b/src/include/storage/dsm.h
> @@ -0,0 +1,40 @@
> +/*-------------------------------------------------------------------------
> + *
> + * dsm.h
> + * manage dynamic shared memory segments
> + *
> + * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
> + * Portions Copyright (c) 1994, Regents of the University of California
> + *
> + * src/include/storage/dsm.h
> + *
> + *-------------------------------------------------------------------------
> + */
> +#ifndef DSM_H
> +#define DSM_H
> +
> +#include "storage/dsm_impl.h"
> +
> +typedef struct dsm_segment dsm_segment;
> +
> +/* Initialization function. */
> +extern void dsm_postmaster_startup(void);
> +
> +/* Functions that create, update, or remove mappings. */
> +extern dsm_segment *dsm_create(uint64 size, char *preferred_address);
> +extern dsm_segment *dsm_attach(dsm_handle h, char *preferred_address);
> +extern void *dsm_resize(dsm_segment *seg, uint64 size,
> + char *preferred_address);
> +extern void *dsm_remap(dsm_segment *seg, char *preferred_address);
> +extern void dsm_detach(dsm_segment *seg);
Why do we want to expose something unreliable as preferred_address to
the external interface? I haven't read the code yet, so I might be
missing something here.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2013-08-27 14:09:42 | Re: Support for REINDEX CONCURRENTLY |
Previous Message | Andres Freund | 2013-08-27 13:56:36 | Re: dynamic background workers, round two |