From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Larry Rosenman <ler(at)lerctr(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: DSM robustness failure (was Re: Peripatus/failures) |
Date: | 2018-10-27 15:26:25 |
Message-ID: | CAA4eK1JrKXyhRVWJeUY1XcdCsNFZEfMPvbPUjTk+F6BN2uvuRw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Oct 18, 2018 at 2:33 PM Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>
> On Thu, Oct 18, 2018 at 5:00 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > The below code seems to be problemetic:
> > dsm_cleanup_using_control_segment()
> > {
> > ..
> > if (!dsm_control_segment_sane(old_control, mapped_size))
> > {
> > dsm_impl_op(DSM_OP_DETACH, old_control_handle, 0, &impl_private,
> > &mapped_address, &mapped_size, LOG);
> > ..
> > }
> >
> > Here, don't we need to use dsm_control_* variables instead of local
> > variable mapped_* variables?
>
> I was a little fuzzy on when exactly
> dsm_cleanup_using_control_segment() and dsm_postmaster_shutdown() run,
> but after some more testing I think I have this straight now. You can
> test by setting dsm_control->magic to 42 in a debugger and trying
> three cases:
>
> 1. Happy shutdown: dsm_postmaster_shutdown() complains on shutdown.
> 2. kill -9 a non-postmaster process: dsm_postmaster_shutdown()
> complains during auto-restart.
> 3. kill -9 the postmaster, manually start up again:
> dsm_cleanup_using_control_segment() runs. It ignores the old segment
> quietly if it doesn't pass the sanity test.
>
> So to answer your question: no, dsm_cleanup_using_control_segment() is
> case 3. This entirely new postmaster process has never had the
> segment mapped in, so the dsm_control_* variables are not relevant
> here.
>
> Hmm.... but if you're running N other independent clusters on the same
> machine that started up after this cluster crashed in case 3, I think
> there is an N-in-four-billion chance that the segment with that ID now
> belongs to another cluster and happens to be its DSM control segment,
> and therefore passes the magic-number sanity test, and then we'll nuke
> it and all the segments it references. Am I missing something?
>
Unless the previous cluster (which crashed) has removed the segment,
how will new cluster succeed in getting the same segment. Won't it
get the EExist and retry to get the segment with another id?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Dmitry Molotkov | 2018-10-27 15:39:33 | Re: BUG #15446: Crash on ALTER TABLE |
Previous Message | Andres Freund | 2018-10-27 15:22:03 | Re: Resetting PGPROC atomics in ProcessInit() |