Quick Links

Re:Re:Re: backup server core when redo btree_xlog_insert that type is XLOG_BTREE_INSERT_POST

From:	yuansong <yyuansong(at)126(dot)com>
To:	"Peter Geoghegan" <pg(at)bowt(dot)ie>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re:Re:Re: backup server core when redo btree_xlog_insert that type is XLOG_BTREE_INSERT_POST
Date:	2024-11-27 10:53:20
Message-ID:	64d41b6e.9246.1936d410e18.Coremail.yyuansong@126.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs pgsql-hackers

we find crash reson

We have identified the cause of the crash: it was due to the XLOG_BTREE_INSERT_POST XLOG having an OffsetNumber offnum that was one less than what was stored in the index. I experimented with adding +1, and the index data remained normal in both cases. This issue is likely caused by concurrent operations on the B-tree, and upon reviewing the corresponding WAL logs, we found SPLIT_L and INSERT_LEAF operations on the same block before the crash. This might be a bug. I'm not sure if there's a related fix.

At 2024-11-21 23:58:03, "Peter Geoghegan" <pg(at)bowt(dot)ie> wrote:
>On Thu, Nov 21, 2024 at 10:03 AM yuansong <yyuansong(at)126(dot)com> wrote:
>> Should nhtids be less than or equal to IndexTupleSize(oposting)?
>> Why is nhtids larger than IndexTupleSize(oposting) ? I think there should be an error in the master host writing the wal log.
>> Does anyone know when this will happen?
>
>It'll happen whenever there is a certain kind of data corruption.
>
>There were complaints about issues like this in the past. But those
>complaints seem to have gone away when more hardening was added to the
>code that runs during original execution (not the REDO routine code,
>which can only do what it is told to do by the WAL record).
>
>You're using PostgreSQL 13.2, which is a very old point release that
>lacks this hardening -- the current 13 point release is 13.18, so
>you're missing a lot. Had you been on a later point release you'd very
>probably have still had the issue with corruption (which could be from
>bad hardware), but you likely would have avoided the problem with the
>REDO routine crashing like this.
>
>--
>Peter Geoghegan

Attachment	Content-Type	Size
微信图片_20241127183704.jpg	image/jpeg	2.0 MB

In response to

Re: backup server core when redo btree_xlog_insert that type is XLOG_BTREE_INSERT_POST at 2024-11-21 15:58:03 from Peter Geoghegan

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Andrew Dunstan	2024-11-27 12:52:16	Re: pg_rewind fails on Windows where tablespaces are used
Previous Message	Thomas Munro	2024-11-27 09:27:21	Re: Build failure with GCC 15 (defaults to -std=gnu23)

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Nisha Moond	2024-11-27 10:54:49	Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message	Kirill Reshke	2024-11-27 10:46:36	Re: Add Pipelining support in psql