Re: Changeset Extraction v7.0 (was logical changeset generation)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Changeset Extraction v7.0 (was logical changeset generation)
Date: 2014-01-15 18:28:25
Message-ID: CA+TgmoaWO2xU0LGQ_DFd0RaQM5ZRa8wB6u+-0UAdSrau1i0V=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Review of patch 0002:

- I think you should just regard ReplicationSlotCtlLock as protecting
the "name" and "active" flags of every slot. ReplicationSlotCreate()
would then not need to mess with the spinlocks at all, and
ReplicationSlotAcquire and ReplicationSlotDrop get a bit simpler too I
think. Functions like ReplicationSlotsComputeRequiredXmin() can
acquire this lock in shared mode and only lock the slots that are
actually active, instead of all of them.

- If you address /* FIXME: apply sanity checking to slot name */, then
I think that also addresses /* XXX: do we want to use truncate
identifier instead? */. In other words, let's just error out if the
name is too long. I'm not sure what other sanity checking is needed
here; maybe just restrict it to 7-bit characters to avoid problems
with encodings for individual databases varying.

- ReplicationSlotAcquire probably needs to ignore slots that are not active.

- ReplicationSlotAcquire should be tweaked so that the code that holds
the spinlock is more self-contained. If you adopt the above-proposed
recasting of ReplicationSlotCtlLock, then the part that holds the
spinlock can probably look like this: SpinLockAcquire(&slot->mutex);
was_active = slot->active; slot->active = true;
SpinLockRelease(&slot->mutex), which looks quite a bit safer.

- If there's a coding rule that slot->database can't be changed while
the slot is active, then the check to make sure that the user isn't
trying to bind to a slot with a mis-matching database could be done
before the code described in the previous point, avoiding the need to
go back and release the resource.

- I think the critical section in ReplicationSlotDrop is bogus. If
DeleteSlot() fails, we scarcely need to PANIC. The slot just isn't
gone.

- cancel_before_shmem_exit is only guaranteed to remove the
most-recently-added callback.

- Why does ReplicationSlotsComputeRequiredXmin() need to hold
ProcArrayLock at all?

- ReplicationSlotsComputeRequiredXmin scarcely needs to hold the
spinlock while it does all of those gyrations. It can just acquire
the spinlock, copy the three fields needed into local variables, and
release the spinlock. The rest can be worked out afterwards.
Similarly in ReplicationSlotsComputeRequiredXmin.

- A comment in KillSlot wonders whether locking is required. I say
yes. It's safe to take lwlocks and spinlocks during shmem exit, and
failing to do so seems like a recipe for subtle corner-case bugs.

- pg_get_replication_slots() wonders what permissions we require. I
don't know that any special permissions are needed here; the data
we're exposing doesn't appear to be sensitive. Unless I'm missing
something?

- PG_STAT_GET_LOGICAL_DECODING_SLOTS_COLS has a leftover "logical" in its name.

- There seems to be no interface to acquire or release slots from
either SQL or the replication protocol, nor any way for a client of
this code to update its slot details. The value of
catalog_xmin/data_xmin vs. effective_catalog_xmin/effective_data_xmin
is left to the imagination.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-01-15 18:34:32 Re: drop duplicate buffers in OS
Previous Message Fabrízio de Royes Mello 2014-01-15 18:27:49 Re: