From: | "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com> |
---|---|
To: | <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Add Information during standby recovery conflicts |
Date: | 2020-06-18 07:28:13 |
Message-ID: | 9a60178c-a853-1440-2cdc-c3af916cff59@amazon.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi hackers,
I've attached a patch to add information during standby recovery conflicts.
The motive behind is to be able to get some information:
* On the apply side
* On the blocker(s) side
_Motivation:_
When a standby recovery conflict occurs it could be useful to get more
information to be able to dive deep on the root cause and find a way to
avoid/mitigate new occurrences.
Adding this information would make the investigations easier, it could
help answering questions like:
* On which LSN was the WAL apply blocked?
* What was the purpose of the bocked WAL record?
* On which relation (if any) was the blocked WAL record related to?
* What was the blocker(s) doing?
* When did the blocker(s) started their queries (if any)?
* What was the blocker(s) waiting for? on which wait event?
_Technical context and proposal:_
There is 2 points in this patch:
* Add the information about the blocked WAL record. This is done in
standby.c (ResolveRecoveryConflictWithVirtualXIDs,
ResolveRecoveryConflictWithDatabase, StandbyTimeoutHandler)
* Add the information about the blocker(s). This is done in postgres.c
(RecoveryConflictInterrupt)
_Outcome Example:_
2020-06-15 06:48:23.774 UTC [6971] LOG: wal record apply is blocked by 2 connection(s), reason: User query might have needed to see row versions that must be removed.
2020-06-15 06:48:23.774 UTC [6971] LOG: blocked wal record rmgr: Heap2, lsn: 0/038E2678, received at: 2020-06-15 06:48:23.774098+00, desc: CLEAN, relation: rel 1663/13586/16652 fork main blk 0
2020-06-15 06:48:54.773 UTC [7088] LOG: about to interrupt pid: 7088, backend_type: client backend, state: active, wait_event_type: Timeout, wait_event: PgSleep, query_start: 2020-06-15 06:48:14.87672+00
2020-06-15 06:48:54.773 UTC [7088] DETAIL: statement: select *, pg_sleep(120) from bdt;
2020-06-15 06:48:54.773 UTC [7088] STATEMENT: select *, pg_sleep(120) from bdt;
2020-06-15 06:48:54.773 UTC [7088] ERROR: canceling statement due to conflict with recovery
2020-06-15 06:48:54.773 UTC [7088] DETAIL: User query might have needed to see row versions that must be removed.
2020-06-15 06:48:54.773 UTC [7088] STATEMENT: select *, pg_sleep(120) from bdt;
2020-06-15 06:48:54.778 UTC [7037] LOG: about to interrupt pid: 7037, backend_type: client backend, state: active, wait_event_type: Timeout, wait_event: PgSleep, query_start: 2020-06-15 06:48:13.008427+00
2020-06-15 06:48:54.778 UTC [7037] DETAIL: statement: select *, pg_sleep(300) from bdt;
2020-06-15 06:48:54.778 UTC [7037] STATEMENT: select *, pg_sleep(300) from bdt;
2020-06-15 06:48:54.778 UTC [7037] ERROR: canceling statement due to conflict with recovery
2020-06-15 06:48:54.778 UTC [7037] DETAIL: User query might have needed to see row versions that must be removed.
2020-06-15 06:48:54.778 UTC [7037] STATEMENT: select *, pg_sleep(300) from bdt;
It takes care of the other conflicts reason too.
So, should a standby recovery conflict occurs, we could see:
* information about the blocked WAL record apply
* information about the blocker(s)
I will add this patch to the next commitfest. I look forward to your
feedback about the idea and/or implementation.
Regards,
Bertrand
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Add-Standby-Recovery-Conflicts-Info.patch | text/plain | 12.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2020-06-18 07:32:59 | Re: Review for GetWALAvailability() |
Previous Message | Peter Eisentraut | 2020-06-18 07:13:35 | Re: factorial of negative numbers |