From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andrew Bille <andrewbille(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #18658: Assert in SerialAdd() due to race condition |
Date: | 2024-10-19 09:00:00 |
Message-ID: | ea48b857-4e07-dd43-375e-564e13f5bfb2@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hello Heikki,
18.10.2024 23:15, Heikki Linnakangas wrote:
>
> Thanks for the repro, Andrew & Alexander! I was able to reproduce this too. It reproduces very quickly with the script
> you provided, if you add this sleep to ReleasePredicateLocks():
>
> @@ -3654,6 +3667,8 @@ ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
>
> LWLockRelease(SerializableFinishedListLock);
>
> + pg_usleep(1000);
> +
> if (needToClear)
> ClearOldPredicateLocks();
>
> I think the assertion is too strict. It is normal for tailXid to be invalid in this scenario. The condition is that an
> XID was added to the finished list, but the global xmin has already advanced past that XID. It gets cleared from the
> finished list by the ClearOldPredicateLocks() call, but another backend might call SummarizeOldestCommittedSxact()
> before that.
>
> The attached patch fixes it.
>
Thank you for your attention to this!
I also encountered another (more rare) failure with that script (initially
on REL_16_STABLE, but now I've reproduced this on master too), when it
fails due to ENOSPC. (I could reproduce the failure more or less reliably
by running that script with parallel -j4 using 4 different servers.)
With additional logging added (see attached), I see the following:
2024-10-19 07:34:48.254 UTC [3032898:1][client backend][48/278:0] LOG: !!!SerialAdd| xid: 19957,
serialControl->headPage: 4294967295, tailXid: 20491, SERIAL_ENTRIESPERPAGE: 1024, firstZeroPage: 20, targetPage: 19,
isNewPage: 1
2024-10-19 07:34:48.254 UTC [3032898:2][client backend][48/278:0] STATEMENT: INSERT INTO t VALUES(42);
2024-10-19 07:34:48.254 UTC [3032898:3][client backend][48/278:0] LOG: !!!SerialAdd: isNewPage, firstZeroPage: 20,
targetPage: 19
2024-10-19 07:34:48.254 UTC [3032898:4][client backend][48/278:0] STATEMENT: INSERT INTO t VALUES(42);
2024-10-19 07:35:05.105 UTC [3032898:5][client backend][48/278:0] ERROR: could not access status of transaction 0
2024-10-19 07:35:05.105 UTC [3032898:6][client backend][48/278:0] DETAIL: Could not write to file "pg_serial/11FB3" at
offset 8192: No space left on device.
That is, if SerialAdd() gets xid preceding tailXid and belonging to a
preceding page, the page zeroing loop just runs until ENOSPC.
Your proposed fix (adjusted for REL_16_STABLE) eliminates the issue for me.
Thank you!
Best regards,
Alexander
Attachment | Content-Type | Size |
---|---|---|
SerialAdd-debugging.patch | text/x-patch | 1.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-10-19 16:05:57 | Re: BUG #18657: Using JSON_OBJECTAGG with volatile function leads to segfault |
Previous Message | Amit Langote | 2024-10-19 03:12:57 | Re: BUG #18657: Using JSON_OBJECTAGG with volatile function leads to segfault |