Re: Adding comments to help understand psql hidden queries

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Christensen <david+pg(at)pgguru(dot)net>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, Greg Sabino Mullane <htamfids(at)gmail(dot)com>, Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Adding comments to help understand psql hidden queries
Date: 2025-01-18 20:37:54
Message-ID: 2597072.1737232674@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Christensen <david+pg(at)pgguru(dot)net> writes:
> Any further concerns/issues with this patch that I can address to help
> move it forward?

I got around to looking at this finally --- sorry that it's been on
the back burner for so long. I think this is basically a good idea
but it still requires a lot of sanding-down of rough edges.

The patch doesn't apply cleanly anymore, which is unsurprising since
it's been sitting for months; what might be more surprising is that
there was only one hunk that had to be fixed by hand.

I noticed also that "git diff --check" complains about a bunch of
whitespace style violations, and that brace layout and comment layout
largely fail to comply with PG project standards. I ran the patched
code through pgindent to get rid of those warnings, but did not really
look at whether any of what it changed could be done better.

(I attach a v6 with the results of those changes to pacify the cfbot.
I have not made any changes responding to my comments below.)

Playing around with what it does, my first observation is that the
results look absolutely horrid in an 80-column xterm window:

$ psql -E regression
psql (18devel)
Type "help" for help.

regression=# \d tenk1
/********************************* QUERY (\d) **********************************
/
SELECT c.oid,
...
ORDER BY 2, 3;
/*******************************************************************************
/
/********************************* QUERY (\d) **********************************
/
/* Get general table information *
/
SELECT c.relchecks, c.relkind, c.relhasindex, c.relhasrules, c.relhastriggers,
...

So evidently the effective line length is 81 characters not the 80
that the code claims to be using. I did not look to see where the
off-by-one error is.

On the particular xterm setup I use, backing off the number of stars
by one would be enough to make that look better; but I have very often
used setups where printing 80 characters and a newline would result in
a blank line. I think the comment width must be reduced to no more
than 79 characters. Even that seems a little questionable; are there
people who use less-than-80-column terminal windows? I think aiming
for 60 or so columns might be smarter.

There's another issue here too, arising from the fact that you want to
give translated strings to OutputComment(). That's laudable, but it
means that strlen() isn't even approximately the right computation for
how many columns the string will occupy on-screen. (There are very
likely some multibyte characters in the translated string, and then
again some of those characters could be double-width ideograms.)
Now psql does contain code that can compute the actual displayed width
of a translated string, but frankly I'm beginning to question the
value of the whole business. How about just printing a fixed number
of stars, like ten, and dropping the whole concept of a target line
length?

/********** QUERY (\d) **********/
/* Get general table information */
... blah blah blah ...
/*********************/

Secondly, looking at the whole output, it seems quite repetitive:

regression=# \d tenk1
/********************************* QUERY (\d) **********************************/
SELECT ...
/*******************************************************************************/
/********************************* QUERY (\d) **********************************/
/* Get general table information */
SELECT ...
/*******************************************************************************/
/********************************* QUERY (\d) **********************************/
/* Get information about each column
*/
SELECT ...
/*******************************************************************************/
/********************************* QUERY (\d) **********************************/
/* Get information about each index */
SELECT ...
/*******************************************************************************/
/********************************* QUERY (\d) **********************************/
/* Get information about row-level policies */
SELECT ...
/*******************************************************************************/
/********************************* QUERY (\d) **********************************/
/* Get information about extended statistics */
SELECT ...
/*******************************************************************************/
/********************************* QUERY (\d) **********************************/
/* Get information about each publication using this table */
SELECT ...
/*******************************************************************************/
/********************************* QUERY (\d) **********************************/
SELECT c.oid::pg_catalog.regclass
FROM pg_catalog.pg_class c, pg_catalog.pg_inherits i
WHERE c.oid = i.inhparent AND i.inhrelid = '16418'
AND c.relkind != 'p' AND c.relkind != 'I'
ORDER BY inhseqno;
/*******************************************************************************/
/********************************* QUERY (\d) **********************************/
/* Get information about child tables */
SELECT ...
/*******************************************************************************/

Surely we do not need to repeat the "QUERY (\d)" line; in fact,
I think it's confusing to do so. That should appear but once
per user command.

I also find all the stars to be fairly visually distracting.
What do you think of losing those altogether in favor of blank
lines? Something like

/********** QUERY (\d) **********/
SELECT ...

/* Get general table information */
SELECT ...

/* Get information about each column */
SELECT ...

/* Get information about each index */
SELECT ...

/* Get information about row-level policies */
SELECT ...

/* Get information about extended statistics */
SELECT ...

/* Get information about each publication using this table */
SELECT ...

/* Get information about child tables */
SELECT ...

Maybe that's too far in the other direction, but it seems
worth thinking about.

(BTW, there is one query in the output for \d that lacks an
OutputComment gloss. Maybe that's one that got added since
this patch was written?)

Moving on to actual code review:

* The "curcmd" global variable is quite horrid IMO. It would
be only slightly less horrid if it were properly documented
and declared in a header file as globals should be. I suppose you
did that to avoid having to pass the command string down through
a ton of subroutines. However, with my proposal that the query
should be printed only once at the start, maybe we could relocate
the responsibility for printing it to somewhere closer to
exec_command(), and thus reduce the notational overhead?

* It took me awhile to realize that OutputComment resets the
target buffer while OutputCommentStars doesn't. Neither their
names nor their header comments give any clue about that
rather critical-to-callers property. I don't like the name
"OutputCommentStars" anyway as it puts emphasis on what it should
be *hiding* from callers, namely the form of the output. Not
sure about a better name.

* Enabling log_statement = 'all' proves that the comments added
by OutputComment are sent to the server, *even when -E is not on*:

2025-01-18 15:09:23.575 EST [2587557] LOG: statement: /* Get general table information */
SELECT c.relchecks, c.relkind, c.relhasindex, c.relhasrules, c.relhastriggers, c.relrowsecurity, c.relforcerowsecurity, false AS relhasoids, c.relispartition, '', c.reltablespace, CASE WHEN c.reloftype = 0 THEN '' ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text END, c.relpersistence, c.relreplident, am.amname
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_class tc ON (c.reltoastrelid = tc.oid)
LEFT JOIN pg_catalog.pg_am am ON (c.relam = am.oid)
WHERE c.oid = '16418';

I don't find that acceptable at all. For one thing, it makes the
stakes extremely high (as in possibly security-critical) that there
are no "*/" sequences in the label strings. On the whole I'd rather
arrange things so that the comments are only emitted to the psql
terminal and never sent to the server. It appears that that's
true already for some of them --- I didn't trouble to try to
understand why these behave differently.

* In connection with that, I'm none too comfortable with the
assumption "there are no inner comments that need to be escaped",
mainly for the comments that include fragments of user queries.
If we can ensure that none of this output gets to the server then
maybe it's not too critical, but I'm not really convinced. Is it
worth doing something to sanitize the comment contents?

* I think the \n here was unintended:
+ OutputComment(&buf, _("Get information about each column\n"));
That leads to some oddly-formatted output.

Anyway, I encourage you to work on these issues and see if we
can get to a committable patch.

regards, tom lane

Attachment Content-Type Size
v6-0001-Improve-SQL-comments-on-echo-hidden-output.patch text/x-diff 18.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Julian Andres Klode 2025-01-18 20:38:38 Re: InitControlFile misbehaving on graviton
Previous Message Mats Kindahl 2025-01-18 19:44:00 Re: Coccinelle for PostgreSQL development [1/N]: coccicheck.py