Quick Links

Re: Dynamic Shared Memory stuff

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Dynamic Shared Memory stuff
Date:	2013-12-10 23:12:53
Message-ID:	20131210231253.GB1299924@tornado.leadboat.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Dec 10, 2013 at 07:50:20PM +0200, Heikki Linnakangas wrote:
> On 12/10/2013 07:27 PM, Noah Misch wrote:
> >On Thu, Dec 05, 2013 at 06:12:48PM +0200, Heikki Linnakangas wrote:
> >>>On Wed, Nov 20, 2013 at 8:32 AM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
> >>>>* As discussed in the "Something fishy happening on frogmouth" thread, I
> >>>>don't like the fact that the dynamic shared memory segments will be
> >>>>permanently leaked if you kill -9 postmaster and destroy the data directory.

> >>I really think we need to do something about it. To use your earlier
> >>example of parallel sort, it's not acceptable to permanently leak a 512
> >>GB segment on a system with 1 TB of RAM.
> >
> >I don't. Erasing your data directory after an unclean shutdown voids any
> >expectations for a thorough, automatic release of system resources. Don't do
> >that. The next time some new use of a persistent resource violates your hope
> >for this scenario, there may be no remedy.
>
> Well, the point of erasing the data directory is to release system
> resources. I would normally expect "killall -9 <process>; rm -rf
> <data dir>" to thorougly get rid of the running program and all the
> resources. It's surprising enough that the regular shared memory
> segment is left behind

Your expectation is misplaced. Processes and files are simply not the only
persistent system resources of interest.

> but at least that one gets cleaned up when
> you start a new server (on same port).

In the most-typical case, yes. In rare cases involving multiple postmasters
starting and stopping, the successor to the erased data directory will not
clean up the sysv segment.

> Let's not add more cases like that, if we can avoid it.

Only if we can avoid it for a modicum of effort and feature compromise.
You're asking for PostgreSQL to reshape its use of persistent resources so you
can throw around "killall -9 postgres; rm -rf $PGDATA" without so much as a
memory leak. That use case, not PostgreSQL, has the defect here.

> BTW, what if the data directory is seriously borked, and the server
> won't start? Sure, don't do that, but it would be nice to have a way
> to recover if you do anyway. (docs?)

If something is corrupting your data directory in an open-ended manner, you
have bigger problems than a memory leak until reboot. Recovering DSM happens
before we read the control file, so the damage would need to fall among a
short list of files for this to happen (bugs excluded). Nonetheless, I don't
object to documenting the varieties of system resources that PostgreSQL may
reserve and referencing the OS facilities for inspecting them.

Are you actually using PostgreSQL this way: frequent "killall -9 postgres; rm
-rf $PGDATA" after arbitrarily-bad $PGDATA corruption? Some automated fault
injection test rig, perhaps?

> >>One idea is to create the shared memory object with shm_open, and wait
> >>until all the worker processes that need it have attached to it. Then,
> >>shm_unlink() it, before using it for anything. That way the segment will
> >>be automatically released once all the processes close() it, or die. In
> >>particular, kill -9 will release it. (This is a variant of my earlier
> >>idea to create a small number of anonymous shared memory file
> >>descriptors in postmaster startup with shm_open(), and pass them down to
> >>child processes with fork()). I think you could use that approach with
> >>SysV shared memory as well, by destroying the segment with
> >>sgmget(IPC_RMID) immediately after all processes have attached to it.
> >
> >That leaves a window in which we still leak the segment,
>
> A small window is better than a large one.

Yes.

> Another refinement is to wait for all the processes to attach before
> setting the segment's size with ftruncate(). That way, when the
> window is open for leaking the segment, it's still 0-sized so
> leaking it is not a big deal.
>
> >and it is less
> >general: not every use of DSM is conducive to having all processes attach in a
> >short span of time.
>
> Let's cross that bridge when we get there. AFAICS it fits all the
> use cases discussed this far.

It does fit the use cases discussed thus far.

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

In response to

Re: Dynamic Shared Memory stuff at 2013-12-10 17:50:20 from Heikki Linnakangas

Responses

Re: Dynamic Shared Memory stuff at 2013-12-10 23:20:27 from Andres Freund
Re: Dynamic Shared Memory stuff at 2013-12-10 23:26:58 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2013-12-10 23:20:27	Re: Dynamic Shared Memory stuff
Previous Message	Tom Lane	2013-12-10 23:08:58	Re: pg_stat_statements fingerprinting logic and ArrayExpr