Re: pg_upgrade check for invalid databases

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Thomas Krennwallner <tk(at)postsubmeta(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_upgrade check for invalid databases
Date: 2024-10-01 07:42:22
Message-ID: 66A2DFDA-DF16-43BB-AED7-F061A9094142@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 1 Oct 2024, at 02:35, Thomas Krennwallner <tk(at)postsubmeta(dot)net> wrote:
>
> On 30/09/2024 17.29, Daniel Gustafsson wrote:
>>> On 30 Sep 2024, at 16:55, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> TBH I'm not finding anything very much wrong with the current
>>> behavior... this has to be a rare situation, do we need to add
>>> debatable behavior to make it easier?
>> One argument would be to make the checks consistent, pg_upgrade generally tries
>> to report all the offending entries to help the user when fixing the source
>> database. Not sure if it's a strong enough argument for carrying code which
>> really shouldn't see much use though.
> In general, I agree that this situation should be rare for deliberate DROP DATABASE interrupted in interactive sessions.
>
> Unfortunately, for (popular) tools that perform automatic "temporary database" cleanup, we could recently see an increase in invalid databases.
>
> The additional check for pg_upgrade was made necessary due to several unrelated customers having invalid databases that stem from left-over Prisma Migrate "shadow databases" [1]. We could not reproduce this Prisma Migrate issue yet, as those migrations happened some time ago. Maybe this bug really stems from a much older Prisma Migrate version and we only see the fallout now. This is still a TODO item.
>
> But it appears that this tool can get interrupted "at the wrong time" while it is deleting temporary databases (probably a manual Ctrl-C), and clients are unaware that this can then leave behind invalid databases.
>
> Those temporary databases do not cause any harm as they are not used anymore. But eventually, PG installations will be upgraded to the next major version, and it is only then when those invalid databases resurface after pg_upgrade fails to run the checks.

Databases containing transient data no longer needed left by buggy tools is one
thing, but pg_upgrade won't be able to differentiate between those and invalid
databases of legitimate interest. Allowing pg_upgrade to skip invalid
databases expose the risk of (potentially) valuable data being dropped during
the upgrade due to the user not having realized a rarely-used production
database was invalid.

> Long story short: interactive DROP DATABASE interrupts are rare (they do exist, but customers are usually aware). Automation tools on the other hand may run DROP DATABASE and when they get interrupted at the wrong time they will then produce several left-over invalid databases. pg_upgrade will then fail to run the checks.

Checking and reporting all invalid databases during the check phase seems like
a user-friendly option here, I can agree that the current behaviour isn't great
for users experiencing this issue.

--
Daniel Gustafsson

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2024-10-01 08:00:00 promote request not received timely on slow Windows machines
Previous Message Michael Paquier 2024-10-01 07:35:59 Re: Set query_id for query contained in utility statement