From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
Cc: | Larry Rosenman <ler(at)lerctr(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Subject: | Re: DSM robustness failure (was Re: Peripatus/failures) |
Date: | 2018-10-18 01:36:44 |
Message-ID: | 23944.1539826604@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> On Thu, Oct 18, 2018 at 1:10 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> ... However, I'm still slightly interested in how it
>> was that that broke DSM so thoroughly ...
> Me too. Frustratingly, that vm object might still exist on Larry's
> machine if it hasn't been rebooted (since we failed to shm_unlink()
> it), so if we knew its name we could write a program to shm_open(),
> mmap(), dump out to a file for analysis and then we could work out
> which of the sanity tests it failed and maybe get some clues.
Larry's REL_10_STABLE failure logs are interesting:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus&dt=2018-10-17%2020%3A42%3A17
2018-10-17 15:48:08.849 CDT [55240:7] LOG: dynamic shared memory control segment is corrupt
2018-10-17 15:48:08.849 CDT [55240:8] LOG: sem_destroy failed: Invalid argument
2018-10-17 15:48:08.850 CDT [55240:9] LOG: sem_destroy failed: Invalid argument
2018-10-17 15:48:08.850 CDT [55240:10] LOG: sem_destroy failed: Invalid argument
2018-10-17 15:48:08.850 CDT [55240:11] LOG: sem_destroy failed: Invalid argument
... lots more ...
2018-10-17 15:48:08.862 CDT [55240:122] LOG: sem_destroy failed: Invalid argument
2018-10-17 15:48:08.862 CDT [55240:123] LOG: sem_destroy failed: Invalid argument
TRAP: FailedAssertion("!(dsm_control_mapped_size == 0)", File: "dsm.c", Line: 182)
So at least in this case, not only did we lose the DSM segment but also
all of our semaphores. Is it conceivable that Python somehow destroyed
those objects, rather than stomping on the contents of the DSM segment?
If not, how do we explain this log?
Also, why is there branch-specific variation? The fact that v11 and HEAD
aren't whinging about lost semaphores is not hard to understand --- we
stopped using SysV semas. But why don't the older branches look like v10
here?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Haribabu Kommi | 2018-10-18 02:04:56 | Re: Pluggable Storage - Andres's take |
Previous Message | Larry Rosenman | 2018-10-18 01:19:53 | Re: DSM robustness failure (was Re: Peripatus/failures) |