Re: O(1) DSM handle operations

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: O(1) DSM handle operations
Date: 2017-03-28 03:47:59
Message-ID: CAEepm=0wa5qWAJgmFbHjFEMEFwtvSGM+7-Qib_MGeqq5j3f9ug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 28, 2017 at 3:52 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Mar 27, 2017 at 5:13 PM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> This is just a thought for discussion, no patch attached...
>>
>> DSM operations dsm_create(), dsm_attach(), dsm_unpin_segment() perform
>> linear searches of the dsm_control->item array for either a free slot
>> or a slot matching a given handle. Maybe no one thinks this is a
>> problem, because in practice the number of DSM slots you need to scan
>> should be something like number of backends * some small factor at
>> peak.
>
> One thing I thought about when designing the format of the DSM control
> segment was that we need to (attempt to) reread the old segment after
> recovering from a crash, even if it's borked. With the current
> design, I think that nothing too bad can happen even if some or all of
> the old control segment has been overwritten with gibberish. I mean,
> if we get particularly unlucky, we might manage to remove a DSM
> segment that some other cluster is using, but we'd have to be very
> unlucky for things to even get that bad, and we shouldn't crash
> outright.
>
> If we replace the array with some more complicated data structure,
> we'd have to be sure that reading it is robust against it having been
> scrambled by a previous crash. Otherwise, it won't be possible to
> restart the cluster without manual intervention.

Couldn't cleanup code continue to work just the same way though? The
only extra structure is an intrusive freelist, but that could be
completely ignored by code that wants to scan the whole array after
crash. It would only be used to find a free slot after successful
restart, once the freelist is rebuilt and known to be sane, and could
be sanity checked when accessed by dsm_create. So idea 2 doesn't seem
to make that code any less robust, does it?

Deterministic key_t values for SysV IPC do seem problematic thought,
for multiple PostgreSQL clusters. Maybe that is a serious problem for
idea 1.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rafia Sabih 2017-03-28 03:57:11 Re: [COMMITTERS] pgsql: Improve access to parallel query from procedural languages.
Previous Message Craig Ringer 2017-03-28 03:23:09 Re: logical decoding of two-phase transactions