From: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Amcheck: do rightlink verification with lock coupling |
Date: | 2020-01-14 04:47:38 |
Message-ID: | F7527087-6E95-4077-B964-D2CAFEF6224B@yandex-team.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Peter! Sorry for answering so long.
> 11 янв. 2020 г., в 7:49, Peter Geoghegan <pg(at)bowt(dot)ie> написал(а):
>
> I'm curious why Andrey's corruption problems were not detected by the
> cross-page amcheck test, though. We compare the first non-pivot tuple
> on the right sibling leaf page with the last one on the target page,
> towards the end of bt_target_page_check() -- isn't that almost as good
> as what you have here in practice? I probably would have added
> something like this myself earlier, if I had reason to think that
> verification would be a lot more effective that way.
We were dealing with corruption caused by lost page update. Consider two pages:
A->B
If A is split into A` and C we get:
A`->C->B
But if update of A is lost we still have
A->B, but B backward pointers points to C.
B's smallest key is bigger than hikey of A`, but this do not violate
cross-pages invariant.
Page updates may be lost due to bug in backup software with incremental
backups, bug in storage layer of Aurora-style system, bug in page cache, incorrect
fsync error handling, bug in ssd firmware etc. And our data checksums do not
detect this kind of corruption. BTW I think that it would be better if our
checksums were not stored on a page itseft, they could detect this kind of faults.
We were able to timely detect all those problems on primaries in our testing
environment. But much later found out that some standbys were corrupted,
the problem appeared only when they were promoted.
Also, in nearby thread Grygory Rylov (g0djan) is trying to enable one more
invariant in standby checks.
Thanks!
Best regards, Andrey Borodin.
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2020-01-14 05:13:54 | Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions |
Previous Message | Masahiko Sawada | 2020-01-14 04:35:57 | Re: [HACKERS] Block level parallel vacuum |