Re:Re:Re: backup server core when redo btree_xlog_insert that type is XLOG_BTREE_INSERT_POST

From: yuansong <yyuansong(at)126(dot)com>
To: "Peter Geoghegan" <pg(at)bowt(dot)ie>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re:Re:Re: backup server core when redo btree_xlog_insert that type is XLOG_BTREE_INSERT_POST
Date: 2024-11-27 10:53:20
Message-ID: 64d41b6e.9246.1936d410e18.Coremail.yyuansong@126.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

we find crash reson

We have identified the cause of the crash: it was due to the XLOG_BTREE_INSERT_POST XLOG having an OffsetNumber offnum that was one less than what was stored in the index. I experimented with adding +1, and the index data remained normal in both cases. This issue is likely caused by concurrent operations on the B-tree, and upon reviewing the corresponding WAL logs, we found SPLIT_L and INSERT_LEAF operations on the same block before the crash. This might be a bug. I'm not sure if there's a related fix.

At 2024-11-21 23:58:03, "Peter Geoghegan" <pg(at)bowt(dot)ie> wrote:
>On Thu, Nov 21, 2024 at 10:03 AM yuansong <yyuansong(at)126(dot)com> wrote:
>> Should nhtids be less than or equal to IndexTupleSize(oposting)?
>> Why is nhtids larger than IndexTupleSize(oposting) ? I think there should be an error in the master host writing the wal log.
>> Does anyone know when this will happen?
>
>It'll happen whenever there is a certain kind of data corruption.
>
>There were complaints about issues like this in the past. But those
>complaints seem to have gone away when more hardening was added to the
>code that runs during original execution (not the REDO routine code,
>which can only do what it is told to do by the WAL record).
>
>You're using PostgreSQL 13.2, which is a very old point release that
>lacks this hardening -- the current 13 point release is 13.18, so
>you're missing a lot. Had you been on a later point release you'd very
>probably have still had the issue with corruption (which could be from
>bad hardware), but you likely would have avoided the problem with the
>REDO routine crashing like this.
>
>--
>Peter Geoghegan

Attachment Content-Type Size
微信图片_20241127183704.jpg image/jpeg 2.0 MB

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Andrew Dunstan 2024-11-27 12:52:16 Re: pg_rewind fails on Windows where tablespaces are used
Previous Message Thomas Munro 2024-11-27 09:27:21 Re: Build failure with GCC 15 (defaults to -std=gnu23)

Browse pgsql-hackers by date

  From Date Subject
Next Message Nisha Moond 2024-11-27 10:54:49 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Kirill Reshke 2024-11-27 10:46:36 Re: Add Pipelining support in psql