From: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: pgsql: Avoid duplicate XIDs at recovery when building initial snapshot |
Date: | 2018-11-30 14:55:47 |
Message-ID: | CANP8+jKCbqqMSBeJFZWM0PT8Zf=ex13oYsN-Bcfze+_dTz-R9w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers pgsql-hackers |
On Thu, 1 Nov 2018 at 06:09, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Tue, Oct 23, 2018 at 10:43:38AM +0900, Michael Paquier wrote:
> > Well, following the same kind of thoughts, txid_current_snapshot() uses
> > sort_snapshot() to remove all the duplicates after fetching its data
> > from GetSnapshotData(), so wouldn't we want to do something about
> > removal of duplicates if dummy PGXACT entries are found while scanning
> > the ProcArray also in this case? What I would think we should do is not
> > only to patch GetRunningTransactionData() but also GetSnapshotData() so
> > as we don't have duplicates also in this case, and do things in such a
> > way that both code paths use the same logic, and that we don't need to
> > have sort_snapshot() anymore. That would be more costly though...
>
> My apologies it took a bit longer than I thought. I got a patch on my
> desk for a couple of days, and finally took the time to finish something
> which would address the concerns raised here. As long as we don't reach
> more than hundreds of thousands of entries, there is not going to be any
> performance impact. So what I do in the attached is to revert 1df21ddb,
> and then have GetRunningTransactionData() sort the XIDs in the snapshot
> and remove duplicates only if at least one dummy proc entry is found
> while scanning, for xids and subxids. This way, there is no need to
> impact most of the instance deployments with the extra sort/removal
> phase as most don't use two-phase transactions. The sorting done at
> recovery when initializing the standby snapshot still needs to happen of
> course.
>
> The patch is added to the upcoming CF for review.
>
1df21ddb looks OK to me and was simple enough to backpatch safely.
Seems excessive to say that the WAL record is corrupt, it just contains
duplicates, just as exported snapshots do. There's a few other imprecise
things around in WAL, that is why we need the RunningXact data in the first
place. So we have a choice of whether to remove the duplicates eagerly or
lazily.
For GetRunningTransactionData(), we can do eager or lazy, since its not a
foreground process. I don't object to changing it to be eager in this path,
but this patch is more complex than 1df21ddb and I don't think we should
backpatch this change, assuming it is acceptable.
This patch doesn't do it, but the suggestion that we touch
GetSnapshotData() in the same way so we de-duplicate eagerly is a different
matter and would need careful performance testing to ensure we don't slow
down 2PC users.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2018-11-30 19:18:18 | Re: pgsql: Switch pg_verify_checksums back to a blacklist |
Previous Message | Dmitry Dolgov | 2018-11-30 13:54:04 | Re: pgsql: Avoid duplicate XIDs at recovery when building initial snapshot |
From | Date | Subject | |
---|---|---|---|
Next Message | Dmitry Dolgov | 2018-11-30 15:05:35 | Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit) |
Previous Message | Pavel Stehule | 2018-11-30 14:54:12 | Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0 |