Re: DSM robustness failure (was Re: Peripatus/failures)

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Larry Rosenman <ler(at)lerctr(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: DSM robustness failure (was Re: Peripatus/failures)
Date: 2018-10-17 20:43:14
Message-ID: CAEepm=0Oz7m_1EV8Bhc80mKn6XCXeFkRsVuFrXdFvjRcKBXR+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 18, 2018 at 9:00 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> 2018-10-17 13:43:24.235 CDT [46467:6] LOG: dynamic shared memory control segment is corrupt
> TRAP: FailedAssertion("!(dsm_control_mapped_size == 0)", File: "dsm.c", Line: 181)
>
> It looks to me like what's happening is
>
> (1) crashing process corrupts the DSM control segment somehow.

I wonder how. Apparently mapped size was tiny (least likely
explanation), control->magic was wrong, or control->maxitems and
control->nitems were inconsistent with each other or the mapped size.

> (2) dsm_postmaster_shutdown notices that, bleats to the log, and
> figures its job is done.

Right, that seems to be the main problem.

> (3) dsm_postmaster_startup crashes on Assert because
> dsm_control_mapped_size isn't 0, because the old seg is still mapped.

Right.

> I would argue that both dsm_postmaster_shutdown and dsm_postmaster_startup
> are broken here; the former because it makes no attempt to unmap
> the old control segment (which it oughta be able to do no matter how badly
> broken the contents are), and the latter because it should not let
> garbage old state prevent it from establishing a valid new segment.

Looking.

> BTW, the header comment on dsm_postmaster_startup is a lie, which
> is probably not unrelated to its failure to consider this situation.

Agreed.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2018-10-17 21:02:20 Re: pageinspect: add tuple_data_record()
Previous Message Andres Freund 2018-10-17 20:02:00 Re: Large writable variables