From: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Amcheck: do rightlink verification with lock coupling |
Date: | 2020-01-16 09:50:28 |
Message-ID: | 0048EDA7-258A-4908-AB32-BE8273D8AEAD@yandex-team.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> 14 янв. 2020 г., в 9:47, Andrey Borodin <x4mmm(at)yandex-team(dot)ru> написал(а):
>
> Page updates may be lost due to bug in backup software with incremental
> backups, bug in storage layer of Aurora-style system, bug in page cache, incorrect
> fsync error handling, bug in ssd firmware etc. And our data checksums do not
> detect this kind of corruption. BTW I think that it would be better if our
> checksums were not stored on a page itseft, they could detect this kind of faults.
Observed it just now.
There is one HA cluster where a node was marked dead. This node was disconnected from cluster, but due to human error there was postgres running.
Node managed to install block-level incremental backup to the chain. And backup software did not detect that backup step was taken from part of timeline that was not in actual timeline's history.
Result of restoration is:
man-w%/%db R # select bt_index_check('%.pk_%');
bt_index_check
----------------
(1 row)
Time: 1411.065 ms (00:01.411)
man-w%/%db R # select patched_index_check('%.pk_%');
ERROR: XX002: left link/right link pair in index "pk_labels" not in agreement
DETAIL: Block=42705 left block=42707 left link from block=45495.
LOCATION: bt_recheck_block_rightlink, verify_nbtree.c:621
Time: 671.336 ms
('%' is replacing removed chars)
I understand that this corruption was not introduced by postgres itself, but by combination of bug in two 3rd party tools and human error.
But I can imagine similar corruptions with different root causes.
Best regards, Andrey Borodin.
From | Date | Subject | |
---|---|---|---|
Next Message | nuko yokohama | 2020-01-16 09:50:40 | Re: Implementing Incremental View Maintenance |
Previous Message | david.turon | 2020-01-16 09:27:28 | empty range |