Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding
Date: 2021-01-06 06:28:36
Message-ID: 561A2AF4-E745-4043-B5FC-9537AEE55701@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> 1 янв. 2021 г., в 23:05, Andrey Borodin <x4mmm(at)yandex-team(dot)ru> написал(а):
>
> I've found this thread in CF looking for something to review.

We discussed patches with Noah offlist. I'm resending summary to list.

There are two patches:
1. slru-truncate-modulo-v5.patch
2. slru-truncate-t-insurance-v4.patch

It would be a bit easier to review if patches were versioned together (v5 both), because 2nd patch applies on top of 1st. Also 2nd patch have a problem if applying with git (in async.c).

First patch fixes a bug with possible SLRU truncation around wrapping point too early.
Basic idea of the patch is "If we want to delete a range we must be eligible to delete it's beginning and ending".
So to test if page is deletable it tests that first and last xids(or other SLRU's unit) are of no interest. To test if a segment is deletable it tests if first and last pages can be deleted.
Patch adds test in unusual manner: they are implemented as assert functions. Tests are fast, they are only checking basic and edge cases. But tests will not be run if Postgres is build without asserts.
I'm a little suspicious of implementation of *PagePrecedes(int page1, int page2) functions. Consider following example from the patch:

static bool
CommitTsPagePrecedes(int page1, int page2)
{
TransactionId xid1;
TransactionId xid2;

xid1 = ((TransactionId) page1) * COMMIT_TS_XACTS_PER_PAGE;
xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * COMMIT_TS_XACTS_PER_PAGE;
xid2 += FirstNormalTransactionId + 1;

return (TransactionIdPrecedes(xid1, xid2) &&
TransactionIdPrecedes(xid1, xid2 + COMMIT_TS_XACTS_PER_PAGE - 1));
}

We are adding FirstNormalTransactionId to xids to avoid them being special xids.
We add COMMIT_TS_XACTS_PER_PAGE - 1 to xid2 to shift it to the end of the page. But due to += FirstNormalTransactionId we shift slightly behind page boundaries and risk that xid2 + COMMIT_TS_XACTS_PER_PAGE - 1 can become FrozenTransactionId (FirstNormalTransactionId - 1). Thus we add +1 to all values in scope. While the logic is correct, coding is difficult. Maybe we could just use
page1_first_normal_xid = ((TransactionId) page1) * CLOG_XACTS_PER_PAGE + FirstNormalTransactionId;
page2_first_normal_xid = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE + FirstNormalTransactionId;
page2_last_xid = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE + CLOG_XACTS_PER_PAGE - 1;
But I'm not insisting on this.

Following comment is not correct for 1Kb and 4Kb pages
+ * At every supported BLCKSZ, (1 << 31) % COMMIT_TS_XACTS_PER_PAGE == 128.

All above notes are not blocking the fix, I just wanted to let committer know about this. I think that it's very important to have this bug fixed.

Second patch converts binary *PagePreceeds() functions to *PageDiff()s and adds logic to avoid deleting pages in suspicious cases. This logic depends on the scale of returned by diff value: it expects that overflow happens between INT_MIN\INT_MAX. This it prevents page deletion if page_diff <= INT_MIN / 2 (too far from current cleaning point; and in normal cases, of cause).

It must be comparison here, not equality test.
- ctl->PagePrecedes(segpage, trunc->earliestExistingPage))
+ ctl->PageDiff(segpage, trunc->earliestExistingPage))

This
int diff_max = ((QUEUE_MAX_PAGE + 1) / 2) - 1,
seems to be functional equivalent of
int diff_max = ((QUEUE_MAX_PAGE - 1) / 2),

What I like about the patch is that it removes all described above trickery around + FirstNormalTransactionId + 1.

AFAICS the overall purpose of the 2nd patch is to help corrupted by other bugs clusters avoid deleting SLRU segments.
I'm a little bit afraid that this kind of patch can hide bugs (while potentially saving some users data). Besides this patch seems like a useful precaution. Maybe we could emit scary warnings if SLRU segments do not stack into continuous range?

Thanks!

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message matsumura.ryo@fujitsu.com 2021-01-06 06:29:30 RE: libpq debug log
Previous Message Dilip Kumar 2021-01-06 06:01:05 Re: Parallel Inserts in CREATE TABLE AS