Re: dynamic shared memory

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: dynamic shared memory
Date: 2013-09-24 18:19:53
Message-ID: 20130924181953.GC11521@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-09-24 12:19:51 -0400, Robert Haas wrote:
> On Fri, Sep 20, 2013 at 7:44 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> >> > Hm, I guess you dont't want to add it to global/ or so because of the
> >> > mmap implementation where you presumably scan the directory?
> >>
> >> Yes, and also because I thought this way would make it easier to teach
> >> things like pg_basebackup (or anybody's home-brew scripts) to just
> >> skip that directory completely. Actually, I was wondering if we ought
> >> to have a directory under pgdata whose explicit charter it was to
> >> contain files that shouldn't be copied as part of a base backup.
> >> pg_do_not_back_this_up.
> >
> > Wondered exactly about that as soon as you've mentioned
> > pg_basebackup. pg_local/?
>
> That seems reasonable. It's not totally transparent what that's
> supposed to mean, but it's fairly mnemonic once you know. Other
> suggestions welcome, if anyone has ideas.

pg_node_local/ was the only reasonable thing I could think of otherwise,
and I disliked it because it seems we shouldn't introduce "node" as a
term just for this.

> Are there any other likely candidates for inclusion in that directory
> other than this stuff?

You could argue that pg_stat_tmp/ is one.

> >> > Why are you using open() and not
> >> > BasicOpenFile or even OpenTransientFile?
> >>
> >> Because those don't work in the postmaster.
> >
> > Oh, that's news to me. Seems strange, especially for BasicOpenFile.

> Per its header comment, InitFileAccess is not called in the
> postmaster, so there's no VFD cache. Thus, any attempt by
> BasicOpenFile to call ReleaseLruFile would be pointless at best.

Well, but it makes code running in both backends and postmaster easier
to write. Good enough for me anyway.

> >> > Imo that's a PANIC or at the very least a FATAL.
> >>
> >> Sure, that's a tempting option, but it doesn't seem to serve any very
> >> necessary point. There's no data corruption problem if we proceed
> >> here. Most likely either (1) there's a bug in the code, which
> >> panicking won't fix or (2) the DBA hand-edited the state file, in
> >> which case maybe he shouldn't have done that, but if he thinks the
> >> best way to recover from that is a cluster-wide restart, he can do
> >> that himself.
> >
> > "There's no data corruption problem if we proceed" - but there likely
> > has been one leading to the current state.
>
> I doubt it. It's more likely that the file permissions got changed or
> something.

We panic in that case during a shutdown, don't we? ... Yep:
PANIC: could not open control file "global/pg_control": Permission denied

> > I am not talking about lwlocks itself being setup but an environment
> > that has resource owners defined and catches errors. I am specifically
> > asking because you're a) ereport()ing without releasing an LWLock b)
> > unconditionally relying on the fact that there's a current resource
> > owner.
> > In shared_preload_libraries neither is the case afair?

> I don't really feel like solving all of those problems and, TBH, I
> don't see why it's particularly important. If somebody wants a
> loadable module that can be loaded either from
> shared_preload_libraries or on the fly, and they use dynamic shared
> memory in the latter case, then they can use it in the former case as
> well. If they've already got logic to create the DSM when it's first
> needed, it doesn't cost extra to do it that way in both cases.

Fair enough.

> >> They'll continue to see the portion they have mapped, but must do
> >> dsm_remap() if they want to see the whole thing.
> >
> > But resizing can shrink, can it not? And we do an ftruncate() in at
> > least the posix shmem case. Which means the other backend will get a
> > SIGSEGV accessing that memory IIRC.

> Yep. Shrinking the shared memory segment will require special
> caution. Caveat emptor.

Then a comment to that effect would be nice.

> > We're not talking about a missed munmap() but about one that failed. If
> > we unpin the leaked pins and notice that we haven't actually pinned it
> > anymore we do error (well, Assert) out. Same for TupleDescs.
> >
> > If there were valid scenarios in which you could get into that
> > situation, maybe. But which would that be? ISTM we can only get there if
> > our internal state is messed up.

> I don't know. I think that's part of why it's hard to decide what we
> want to happen. But personally I think it's paranoid to say, well,
> something happened that we weren't expecting, so that must mean
> something totally horrible has happened and we'd better die in a fire.

Well, by that argument we wouldn't need to PANIC on a whole host of
issues. Like segfaults.

Anyway, I guess we need other opinions here.

> >> > Why isn't the port number part of the posix shmem identifiers? Sure, we
> >> > retry, but using a logic similar to sysv_shmem.c seems like a good idea.
> >>
> >> According to the man page for shm_open on Solaris, "For maximum
> >> portability, name should include no more than 14 characters, but this
> >> limit is not enforced."
> >
> > What about "/pgsql.%u" or something similar? That should still fit.
>
> Well, if you want both the port and the identifier in there, that
> doesn't get you there.

Port seems enough to start with - most machines are configured to only
have one cluster starting on one port. That way we wouldn't always get
conflicts but just if somebody does something crazy.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-09-24 18:25:53 Re: FW: REVIEW: Allow formatting in log_line_prefix
Previous Message Robert Haas 2013-09-24 18:05:34 Re: record identical operator