From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | emit recovery stats via a new file or a new hook |
Date: | 2021-10-31 13:36:07 |
Message-ID: | CALj2ACVByc437xcNva3gfG++yWf+uerABuB7intuhQt7a69fNQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
It is sometimes super important to be able to answer customer
questions like: What was the total time taken by the last recovery of
the server? What was the time taken by each phase of recovery/redo
processing of the startup process? Why did the recovery take so long?
We've encountered these questions while dealing with the postgres
customers. If these stats are available in an easily consumable
fashion, it will be easier for us to understand, debug and identify
root cause for "recovery taking a long time" problems, improve if
possible and answer the customer questions. Also, these recovery stats
can be read by an external analytical tool to show the recovery
patterns to the customers directly. Although postgres emits some info
via server logs thanks to the recent commit [3], it isn't easily
consumable for the use cases that I mentioned.
Here are a few thoughts on how we could go about doing this. I
proposed them earlier in [1],
1) capture and write recovery stats into a file
2) capture and emit recovery stats via a new hook
3) capture and write into a new system catalog table (assuming at the
end of the recovery the database is in a consistent state, but I'm not
sure if we ever update any catalog tables in/after the
startup/recovery phase)
As Robert rightly suggested at [2], option (3) isn't an easy way to do
that so we can park that idea aside, options (1) and (2) seem
reasonable.
Thoughts?
[1] - https://www.postgresql.org/message-id/CALj2ACUwb3x%2BJFHkXp4Lf603Q3qFgK0P6kSsJvZkH4QAvGv4ig%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CA%2BTgmoZ0b7JkNexaoGDXJ%3D8Zq%2B_NFZBek1oyyPU%2BDDsRi1dsCw%40mail.gmail.com
[3] - commit 9ce346eabf350a130bba46be3f8c50ba28506969
Author: Robert Haas <rhaas(at)postgresql(dot)org>
Date: Mon Oct 25 11:51:57 2021 -0400
Report progress of startup operations that take a long time.
Regards,
Bharath Rupireddy.
From | Date | Subject | |
---|---|---|---|
Next Message | Bharath Rupireddy | 2021-10-31 13:50:12 | should we enable log_checkpoints out of the box? |
Previous Message | Peter Eisentraut | 2021-10-31 10:08:18 | Synchronizing slots from primary to standby |