Re: BUG #17245: Index corruption involving deduplicated entries

From: Kamigishi Rei <iijima(dot)yun(at)koumakan(dot)jp>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: BUG #17245: Index corruption involving deduplicated entries
Date: 2021-10-29 07:55:17
Message-ID: 551936fa-9ba8-aed1-7ae1-c77d5920101c@koumakan.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 29.10.2021 1:01, Andres Freund wrote:
>> The issue manifested again earlier today *after* a REINDEX followed by
>> enabling WAL replica logging on the 24th of October. I saved a snapshot of
>> the filesystem holding the data directory. Would that be useful for further
>> analysis?
> Yes, that's *quite* useful. I assume you can't just share that snapshot?

I am afraid it contains personal data (the mwuser table with e-mail
addresses, passwords, and so on) for multiple different MediaWiki
instances' databases. I will look into scrubbing that kind of data out
later today. I assume dropping the other databases from the cluster
should be fine and will not affect further analysis?

With the personal data scrubbed I will likely be able to provide SSH
access (with su/sudo available) to the VM if needed, though this will
take time (I will need to make a DMZ for that VM). Please inform me if
this would be desirable.

> Once we identified an affected heap and index page with the corruption, we
> should use pg_waldump to scan for all changes to that table.
>
> Do you have the log file(s) from between the 24th and now? That might give us
> a good starting point for the LSN range to scan.

There are multiple WAL log files, the first of them with the timestamp
of Oct 25 09:45.

I am currently moving the snapshot over from my server to the VM I made
for this investigation. I will look into pg_waldump documentation as
soon as possible; I have not had to deal with WAL logs before.

P. S. To possibly make some things simpler: I am on #postgresql on
Libera as Remilia (or IijimaYun in case of disconnects) and am generally
available from 06:30 UTC to around 21:00 UTC.

--
K. R.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Marek Läll 2021-10-29 08:07:28 Re: BUG #17240: <timestamptz> at time zone ... ; wrong result
Previous Message PG Bug reporting form 2021-10-29 07:00:01 BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum