From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node |
Date: | 2012-06-20 07:32:28 |
Message-ID: | CA+U5nMLsdxnqjv1cNpY82F5jekR8kusqYUiUnuvDm7aqzQXPXw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 20 June 2012 14:40, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> The reason we need an origin id in this scenario is that otherwise this will
> happen:
>
> 1. A row is updated on node A
> 2. Node B receives the WAL record from A, and updates the corresponding row
> in B. This generates a new WAL record.
> 3. Node A receives the WAL record from B, and updates the rows again. This
> again generates a new WAL record, which is replicated to A, and you loop
> indefinitely.
>
> If each WAL record carries an origin id, node A can use it to refrain from
> applying the WAL record it receives from B, which breaks the loop.
>
> However, note that in this simple scenario, if the logical log replay /
> conflict resolution is smart enough to recognize that the row has already
> been updated, because the old and the new rows are identical, the loop is
> broken at step 3 even without the origin id. That works for the
> newest-update-wins and similar strategies. So the origin id is not
> absolutely necessary in this case.
Including the origin id in the WAL allows us to filter out WAL records
when we generate LCRs, so we can completely avoid step 3, including
all of the CPU, disk and network overhead that implies. Simply put, we
know the change came from A, so no need to send the change to A again.
If we do allow step 3 to exist, we still need to send the origin id.
This is because there is no apriori way to know the origin id is not
required, such as the case where we have concurrent updates which is
effectively a race condition between actions on separate nodes. Not
sending the origin id because it is not required in some cases is
equivalent to saying we can skip locking because a race condition does
not happen in all cases. Making a case that the race condition is rare
is still not a case for skipping locking. Same here: we need the
information to avoid making errors in the general case.
> Another interesting scenario is that you maintain a global counter, like in
> an inventory system, and conflicts are resolved by accumulating the updates.
> For example, if you do "UPDATE SET counter = counter + 1" simultaneously on
> two nodes, the result is that the counter is incremented by two. The
> avoid-update-if-already-identical optimization doesn't help in this case,
> the origin id is necessary.
>
> Now, let's take the inventory system example further. There are actually two
> ways to update a counter. One is when an item is checked in or out of the
> warehouse, ie. "UPDATE counter = counter + 1". Those updates should
> accumulate. But another operation resets the counter to a specific value,
> "UPDATE counter = 10", like when taking an inventory. That should not
> accumulate with other changes, but should be newest-update-wins. The origin
> id is not enough for that, because by looking at the WAL record and the
> origin id, you don't know which type of an update it was.
Yes, of course. Conflict handling in the general case requires much
additional work. This thread is one minor change of many related
changes. The patches are being submitted in smaller chunks to ease
review, so sometimes there are cross links between things.
> So, I don't like the idea of adding the origin id to the record header. It's
> only required in some occasions, and on some record types.
That conclusion doesn't follow from your stated arguments.
> And I'm worried
> it might not even be enough in more complicated scenarios.
It is not the only required conflict mechanism, and has never been
claimed to be so. It is simply one piece of information needed, at
various times.
> Perhaps we need a more generic WAL record annotation system, where a plugin
> can tack arbitrary information to WAL records. The extra information could
> be stored in the WAL record after the rmgr payload, similar to how backup
> blocks are stored. WAL replay could just ignore the annotations, but a
> replication system could use it to store the origin id or whatever extra
> information it needs.
Additional information required by logical information will be handled
by a new wal_level.
The discussion here is about adding origin_node_id *only*, which needs
to be added on each WAL record.
One question I raised in my review was whether this extra information
should be added by a variable length header, so I already asked this
very question. So far there is no evidence that the additional code
complexity would be warranted. If it became so in the future, it can
be modified again. At this stage there is no need, so the proposal is
to add the field to every WAL record without regard to the setting of
wal_level because there is no measurable downside to doing so. The
downsides of additional complexity are clear and real however, so I
wish to avoid them.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2012-06-20 07:32:38 | Re: pgbench--new transaction type |
Previous Message | Simon Riggs | 2012-06-20 06:42:20 | Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node |