Re: making relfilenodes 56 bits

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making relfilenodes 56 bits
Date: 2022-08-23 03:03:14
Message-ID: CAFiTN-uFRNufBmJTeM6cbm8EPe8BM8TteGBa5Spn9S+CDAj89w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 23, 2022 at 1:46 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Mon, Aug 22, 2022 at 3:55 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > To solve that problem, how about rewriting the system table in the new
> > cluster which has a conflicting relfilenode? I think we can probably
> > do this conflict checking before processing the tables from the old
> > cluster.
>
> Thanks for chiming in.
>
> Right now, there are two parts to the relfilenumber preservation
> system, and this scheme doesn't quite fit into either of them. First,
> the dump includes commands to set pg_class.relfilenode in the new
> cluster to the same value that it had in the old cluster. The dump
> can't include any SQL commands that depend on what's happening in the
> new cluster because pg_dump(all) only connects to a single cluster,
> which in this case is the old cluster. Second, pg_upgrade itself
> copies the files from the old cluster to the new cluster. This doesn't
> involve a database connection at all. So there's no part of the
> current relfilenode preservation mechanism that can look at the old
> AND the new database and decide on some SQL to execute against the new
> database.
>
> I thought for a while that we could use the information that's already
> gathered by get_rel_infos() to do what you're suggesting here, but it
> doesn't quite work, because that function excludes system tables, and
> we can't afford to do that here. We'd either need to modify that query
> to include system tables - at least for the new cluster - or run a
> separate one to gather information about system tables in the new
> cluster. Then, we could put all the pg_class.relfilenode values we
> found in the new cluster into a hash table, loop over the list of rels
> this function found in the old cluster, and for each one, probe into
> the hash table. If we find a match, that's a system table that needs
> to be moved out of the way before calling create_new_objects(), or
> maybe inside that function but before it runs pg_restore.
>
> That doesn't seem too crazy, I think. It's a little bit of new
> mechanism, but it doesn't sound horrific. It's got the advantage of
> being significantly cheaper than my proposal of moving everything out
> of the way unconditionally, and at the same time it retains one of the
> key advantages of that proposal - IMV, anyway - which is that we don't
> need separate relfilenode ranges for user and system objects any more.
> So I guess on balance I kind of like it, but maybe I'm missing
> something.

Okay, so this seems exactly the same as your previous proposal but
instead of unconditionally rewriting all the system tables we will
rewrite only those conflict with the user table or pg_largeobject from
the previous cluster. Although it might have additional
implementation complexity on the pg upgrade side, it seems cheaper
than rewriting everything.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-08-23 03:20:45 Re: shared-memory based stats collector - v70
Previous Message Andres Freund 2022-08-23 02:57:29 Re: SQL/JSON features for v15