Quick Links

Re: Add contrib/pg_logicalsnapinspect

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Add contrib/pg_logicalsnapinspect
Date:	2024-10-08 17:52:11
Message-ID:	CAD21AoAO3zyooPQ+E2rpEi1YBc+gxZr0CUtJwxaFJ-WWimJNvA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Oct 8, 2024 at 9:25 AM Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Tue, Oct 08, 2024 at 04:25:29PM +1100, Peter Smith wrote:
> > Hi, here are some review comments for patch v11.
>
> Thanks for looking at it!
>
> > ======
> > contrib/pg_logicalinspect/specs/logical_inspect.spec
> >
> > 1.
> > nit - Add some missing spaces after commas (,) in the SQL.
>
> Fine by me, done in v12 attached.
>
> > ======
> > doc/src/sgml/pglogicalinspect.sgml
> >
> > 2.
> > + <note>
> > + <para>
> > + The <filename>pg_logicalinspect</filename> functions are called
> > + using a text argument that can be extracted from the output name of the
> > + <function>pg_ls_logicalsnapdir</function>() function.
> > + </para>
> > + </note>
> >
> > 2a. wording
> >
> > The wording "using a text argument that can be extracted" seems like a
> > hangover from the previous implementation; it does not even say what
> > that "text argument" means.
>
> That's right (it's mentioned later on (for each function description) that
> the argument represents the snapshot file name though).
>
> > Why not just say it is a snapshot
> > filename, something like below?
> >
> > SUGGESTION:
> > The pg_logicalinspect functions are called passing a snapshot filename
> > to be inspected. For example, pass a name obtained from the
> > pg_ls_logicalsnapdir() function.
>
> Yeah, I like it, but...
>
> > ~
> >
> > 2b. formatting
> >
> > nit - In the previous implementation the extraction of the LSN was
> > trickier, so this part was worthy of an SGML "NOTE". Now that it is
> > just a filename, I don't know if it needs to be a special note
> > anymore.
>
> In fact, giving it more thoughts, I think we can just remove this part.
> I don't see the extra value anymore and that's something that we may need to
> remove depending on what will be added to this module in the future.
>
> I think that having the argument explanation in each function description is
> enough, done that way in v12.
>
> >
> > ~~~
> >
> > 3.
> > +postgres=# SELECT meta.* FROM pg_ls_logicalsnapdir(),
> > +pg_get_logical_snapshot_meta(name) AS meta;
> > +
> > +-[ RECORD 1 ]--------
> > +magic | 1369563137
> > +checksum | 1028045905
> > +version | 6
> >
> > 3a.
> > If you are going to wrap the SQL across multiple lines like this, then
> > you should show the psql continuation prompt, so that the example
> > looks the same as what the user would see.
>
> I'm not sure about this one. If the user copy/paste the doc as it is then there
> is no psql continuation prompt. If the user does not copy/paste the doc then he
> might indeed see "something" else (but that's not surprising since he did not
> copy/paste). FWIW, there is similar examples in pgstatstatements.sgml.
>
> > ~
> >
> > 3b.
> > FYI, the output of that can return multiple records,
>
> Yes, as the test in this patch does.
>
> > which is
> > b.i) probably not what you intended to demonstrate
> > b.ii) not the same as what the example says
> >
> > e.g., I got this:
> > test_pub=# SELECT meta.* FROM pg_ls_logicalsnapdir(),
> > test_pub-# pg_get_logical_snapshot_meta(name) AS meta;
> > -[ RECORD 1 ]--------
> > magic | 1369563137
> > checksum | 681884630
> > version | 6
> > -[ RECORD 2 ]--------
> > magic | 1369563137
> > checksum | 2213048308
> > version | 6
> > -[ RECORD 3 ]--------
> > magic | 1369563137
> > checksum | 3812680762
> > version | 6
> > -[ RECORD 4 ]--------
> > magic | 1369563137
> > checksum | 3759893001
> > version | 6
> >
>
> I don't get the point here. The examples just show another way to use the functions,
> the ouput is more "anecdotal" than anything else.
>
> >
> > ~~~
> >
> > (Also those #3a, #3b comments apply to both examples)
> >
> > ======
> > src/backend/replication/logical/snapbuild.c
> >
> > 4.
> > - SnapBuild builder;
> > -
> > - /* variable amount of TransactionIds follows */
> > -} SnapBuildOnDisk;
> > -
> > #define SnapBuildOnDiskConstantSize \
> > offsetof(SnapBuildOnDisk, builder)
> > #define SnapBuildOnDiskNotChecksummedSize \
> >
> > Is it better to try to keep those "Size" macros defined along with
> > wherever the SnapBuildOnDisk is defined? Otherwise, if the structure
> > is ever changed, adjusting the macros could be easily overlooked.
>
> I think that the less we put in the snapbuild_internal.h the better. That said,
> I think you have a good point so I added a comment around the SnapBuildOnDisk
> definition instead in v12.
>
> >
> > ~~~
> >
> > 5.
> > ValidateAndRestoreSnapshotFile
> >
> > nit - See [1] #4 suggestion to declare 'sz' at scope where used. The
> > previous reason not to change this (e.g. "mainly inspired from
> > SnapBuildRestore") seems less relevant because now most lines of this
> > function have already been modified for other reasons.
>
> Right. I think that's a matter of taste and I do prefer to "only" do the
> necessary changes that are linked to the feature the patch is implementing.
>
> > ~~~
> >
> > 6.
> > SnapBuildRestore:
> >
> > + if (fd < 0 && errno == ENOENT)
> > + return false;
> > + else if (fd < 0)
> > + ereport(ERROR,
> > + (errcode_for_file_access(),
> > + errmsg("could not open file \"%s\": %m", path)));
> >
> > I think this code fragment looked like this before, and you only
> > relocated it,
>
> That's right.
>
> > but it still seems a bit awkward to write this way.
> > Since so much else has changed, how about also improving this in
> > passing, like below:
> >
> > if (fd < 0)
> > {
> > if (errno == ENOENT)
> > return false;
> >
> > ereport(ERROR,
> > (errcode_for_file_access(),
> > errmsg("could not open file \"%s\": %m", path)));
> > }
>
> Same, I do prefer to "only" do the necessary changes that are linked to the
> feature the patch is implementing (and why stop here, a similar change could be
> made in logical/origin.c too for example).
>

Thank you for updating the patch! I have some comments on v12 patch:

---
+ if (ondisk.builder.committed.xcnt > 0)
+ {
+ Datum *arrayelems;
+ int narrayelems = 0;
+
+ arrayelems = (Datum *)
palloc(ondisk.builder.committed.xcnt * sizeof(Datum));
+
+ for (; narrayelems < ondisk.builder.committed.xcnt;
narrayelems++)
+ arrayelems[narrayelems] =
Int64GetDatum((int64) ondisk.builder.committed.xip[narrayelems]);
+
+ values[i++] =
PointerGetDatum(construct_array_builtin(arrayelems, narrayelems,
INT8OID));
+ }

Since committed_xip and catchange_xip are xid[], we should use
TransactionIdGetDatum() and XIDOID instead.

I think that it would be cleaner if we pass
ondisk.builder.committed.xcnt instead of construct_array_builtin to
construct_array_buildin(). That is, we can rewrite it as follows:

for (int j = 0; j < ondisk.builder.committed.xcnt; j++)
arrayelems[j] = TransactionIdGetDatum(ondisk.builder.committed.xip[j]);

values[i++] = PointerGetDatum(construct_array_builtin(arrayelems,
ondisk.builder.committed.xcnt, XIDOID));

---
+# Test the pg_logicalinspect functions: that needs some permutation to
+# ensure that we are creating multiple logical snapshots and that one of them
+# contains ongoing catalogs changes.

If we use prepared transactions modifying catalog changes, can we
write the normal (i.e. not isolation check) tests? It would be easier
to write and add tests.

---
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/

If we need to use the isolation tests (see above comment), we need to
add both output_iso and tmp_check_iso as well.

---
+ tuple = heap_form_tuple(tupdesc, values, nulls);
+
+ MemoryContextReset(context);
+
+ PG_RETURN_DATUM(HeapTupleGetDatum(tuple));

I think we don't necessarily need to reset the memory context here.
Rather, I think we can just pass CurrentMemoryContext to
ValidateAndRestoreSnapshotFile() instead of passing the separate new
memory context.

---
+ fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+
+ if (fd < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open file \"%s\":
%m", path)));
+
+ context = AllocSetContextCreate(CurrentMemoryContext,
+
"logicalsnapshot inspect context",
+
ALLOCSET_DEFAULT_SIZES);
+
+ /* Validate and restore the snapshot to 'ondisk' */
+ ValidateAndRestoreSnapshotFile(&ondisk, path, fd, context);

It's a bit odd to me that this function opens a snapshot file and
passes both the file descriptor and file path. The file path is used
mostly only for error reporting in ValidateAndRestoreSnapshotFile(). I
guess it would be cleaner if we pass the file path to
ValidateAndRestoreSnapshotFile() which opens and validates the
snapshot file. Since SnapBuildRestore() wants to get false if the
specified file doesn't exist, we can also add missing_ok argument to
ValidateAndRestoreSnapshotFile(). That is, the function will be like:

void
ValidateAndRestoreSnapshotFile(SnapBuildOnDisk *ondisk, const char
*path, MemoryContext context, bool missing_ok)
{
:
fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);

if (fd < 0)
{
if (missing_ok && errno == ENOENT)
return false;
else
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not open file \"%s\": %m", path)));
}
:

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Re: Add contrib/pg_logicalsnapinspect at 2024-10-08 16:25:21 from Bertrand Drouvot

Responses

Re: Add contrib/pg_logicalsnapinspect at 2024-10-09 08:12:32 from Bertrand Drouvot

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2024-10-08 17:52:49	Re: [PATCH] Add min/max aggregate functions to BYTEA
Previous Message	Nazir Bilal Yavuz	2024-10-08 17:30:23	Re: Refactoring postmaster's code to cleanup after child exit