From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)2ndquadrant(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Changeset Extraction v7.0 (was logical changeset generation) |
Date: | 2014-01-18 13:35:47 |
Message-ID: | CA+TgmoasqtaJmgTRBqyRTpQdHDCqjt979W+zY66Ypk5RFEyeoQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jan 16, 2014 at 9:54 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> Maybe it would be better to get rid of active/in_use and have
>> three states: REPLSLOT_CONNECTED, REPLSLOT_NOT_CONNECTED,
>> REPLSLOT_FREE. Or something like that.
>
> Hm. Color me unenthusiastic. If you feel strongly I can change it, but
> otherwise not.
I found the active/in_use distinction confusing; I thought one
three-state flag rather than two Booleans might be clearer. But I
might be able to just suck it up.
>> >> - If there's a coding rule that slot->database can't be changed while
>> >> the slot is active, then the check to make sure that the user isn't
>> >> trying to bind to a slot with a mis-matching database could be done
>> >> before the code described in the previous point, avoiding the need to
>> >> go back and release the resource.
>> >
>> > I don't think slot->database should be allowed to change at all...
>>
>> Well, it can if the slot is dropped and a new one created.
>
> Well. That obviously requires the lwlock to be acquired...
Right, so the point of this comment originally was you had some logic
that could be moved sooner to avoid having to undo so much on a
failure.
>> >> - I think the critical section in ReplicationSlotDrop is bogus. If
>> >> DeleteSlot() fails, we scarcely need to PANIC. The slot just isn't
>> >> gone.
>> >
>> > Well, if delete slot fails, we don't really know at which point it
>> > failed which means that the on-disk state might not correspond to the
>> > in-memory state. I don't see a point in adding code trying to handle
>> > that case correctly...
>>
>> Deleting the slot should be an atomic operation. There's some
>> critical point before which the slot will be picked up by recovery and
>> after which it won't. You either did that operation, or not, and can
>> adjust the in-memory state accordingly.
>
> I am not sure I understand that point. We can either update the
> in-memory bit before performing the on-disk operations or
> afterwards. Either way, there's a way to be inconsistent if the disk
> operation fails somewhere inbetween (it might fail but still have
> deleted the file/directory!). The normal way to handle that in other
> places is PANICing when we don't know so we recover from the on-disk
> state.
> I really don't see the problem here? Code doesn't get more robust by
> doing s/PANIC/ERROR/, rather the contrary. It takes extra smarts to only
> ERROR, often that's not warranted.
People get cranky when the database PANICs because of a filesystem
failure. We should avoid that, especially when it's trivial to do so.
The update to shared memory should be done second and should be set
up to be no-fail.
>> > The slot details get updates by the respective replication code. For
>> > streaming rep, that should happen via reply and feedback
>> > messages. For changeset extraction it happens when
>> > LogicalConfirmReceivedLocation() is called; the walsender interface
>> > does that using reply messages, the SQL interface calls it when
>> > finished (unless you use the _peek_ functions).
>>
>> Right, but where is this code? I don't see this updating the reply
>> and feedback message processing code to touch slots. Did I miss that?
>
> It's in "wal_decoding: logical changeset extraction walsender interface"
> currently :(. Splitting the streaming replication part of that patch off
> isn't easy...
Ack. I was hoping to work through these patches one at a time, but
that's not going to work if they are interdependent to that degree.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2014-01-18 14:14:02 | Re: currawong is not a happy animal |
Previous Message | Robert Haas | 2014-01-18 13:31:55 | Re: Changeset Extraction v7.0 (was logical changeset generation) |