From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding |
Date: | 2021-01-09 10:17:59 |
Message-ID: | 20210109101759.GA122398@rfd.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jan 06, 2021 at 11:28:36AM +0500, Andrey Borodin wrote:
> First patch fixes a bug with possible SLRU truncation around wrapping point too early.
> Basic idea of the patch is "If we want to delete a range we must be eligible to delete it's beginning and ending".
> So to test if page is deletable it tests that first and last xids(or other SLRU's unit) are of no interest. To test if a segment is deletable it tests if first and last pages can be deleted.
Yes.
> I'm a little suspicious of implementation of *PagePrecedes(int page1, int page2) functions. Consider following example from the patch:
>
> static bool
> CommitTsPagePrecedes(int page1, int page2)
> {
> TransactionId xid1;
> TransactionId xid2;
>
> xid1 = ((TransactionId) page1) * COMMIT_TS_XACTS_PER_PAGE;
> xid1 += FirstNormalTransactionId + 1;
> xid2 = ((TransactionId) page2) * COMMIT_TS_XACTS_PER_PAGE;
> xid2 += FirstNormalTransactionId + 1;
>
> return (TransactionIdPrecedes(xid1, xid2) &&
> TransactionIdPrecedes(xid1, xid2 + COMMIT_TS_XACTS_PER_PAGE - 1));
> }
>
> We are adding FirstNormalTransactionId to xids to avoid them being special xids.
> We add COMMIT_TS_XACTS_PER_PAGE - 1 to xid2 to shift it to the end of the page. But due to += FirstNormalTransactionId we shift slightly behind page boundaries and risk that xid2 + COMMIT_TS_XACTS_PER_PAGE - 1 can become FrozenTransactionId (FirstNormalTransactionId - 1). Thus we add +1 to all values in scope. While the logic is correct, coding is difficult. Maybe we could just use
Right. The overall objective is to compare the first XID of page1 to the
first and last XIDs of page2. The FirstNormalTransactionId+1 addend operates
at a lower level. It just makes TransactionIdPrecedes() behave like
NormalTransactionIdPrecedes() without the latter's assertion.
> page1_first_normal_xid = ((TransactionId) page1) * CLOG_XACTS_PER_PAGE + FirstNormalTransactionId;
> page2_first_normal_xid = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE + FirstNormalTransactionId;
> page2_last_xid = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE + CLOG_XACTS_PER_PAGE - 1;
> But I'm not insisting on this.
I see your point, but I doubt using different addends on different operands
makes this easier to understand. If anything, I'd lean toward adding more
explicit abstraction between "the XID we intend to test" and "the XID we're
using to fool some general-purpose API".
> Following comment is not correct for 1Kb and 4Kb pages
> + * At every supported BLCKSZ, (1 << 31) % COMMIT_TS_XACTS_PER_PAGE == 128.
Fixed, thanks.
> All above notes are not blocking the fix, I just wanted to let committer know about this. I think that it's very important to have this bug fixed.
>
> Second patch converts binary *PagePreceeds() functions to *PageDiff()s and adds logic to avoid deleting pages in suspicious cases. This logic depends on the scale of returned by diff value: it expects that overflow happens between INT_MIN\INT_MAX. This it prevents page deletion if page_diff <= INT_MIN / 2 (too far from current cleaning point; and in normal cases, of cause).
>
> It must be comparison here, not equality test.
> - ctl->PagePrecedes(segpage, trunc->earliestExistingPage))
> + ctl->PageDiff(segpage, trunc->earliestExistingPage))
That's bad. Fixed, thanks.
> This
> int diff_max = ((QUEUE_MAX_PAGE + 1) / 2) - 1,
> seems to be functional equivalent of
> int diff_max = ((QUEUE_MAX_PAGE - 1) / 2),
Do you think one conveys the concept better than the other?
> AFAICS the overall purpose of the 2nd patch is to help corrupted by other bugs clusters avoid deleting SLRU segments.
Yes.
> I'm a little bit afraid that this kind of patch can hide bugs (while potentially saving some users data). Besides this patch seems like a useful precaution. Maybe we could emit scary warnings if SLRU segments do not stack into continuous range?
Scary warnings are good for an observation that implies a bug, but the
slru-truncate-t-insurance patch causes such an outcome in non-bug cases where
it doesn't happen today. In other words, discontinuous ranges of SLRU
segments would be even more common after that patch. For example, it would
happen anytime oldestXID advances by more than ~1B at a time.
Thanks,
nm
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2021-01-09 10:24:23 | pg_dump vs. GRANT OPTION among initial privileges |
Previous Message | Amit Kapila | 2021-01-09 07:27:24 | Re: [HACKERS] logical decoding of two-phase transactions |