Re: getting "shell command argument contains a newline or carriage return:" error with pg_dumpall when db name have new line in double quote

From: Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, Srinath Reddy <srinath2133(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: getting "shell command argument contains a newline or carriage return:" error with pg_dumpall when db name have new line in double quote
Date: 2025-04-06 17:34:51
Message-ID: 202504061734.mtvroeo3gn33@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2025-Apr-06, Tom Lane wrote:

> I'd be 100% behind forbidding all ASCII control characters in all
> identifiers. I can't see any situation in which that's a good thing,
> and I can think of plenty where it's a mistake (eg your editor
> decided to change space to tab) or done with underhanded intent.

Right.

> If we can cite the SQL standard then it's an entirely defensible
> restriction.

We can. It says (in 5.2 <token> and <separator>)

<regular identifier> ::= <identifier body>
<identifier body> ::= <identifier start> [ <identifier part>... ]
<identifier part> ::= <identifier start> | <identifier extend>
<identifier start> ::= !! See the Syntax Rules.
<identifier extend> ::= !! See the Syntax Rules.

Syntax Rules
1) An <identifier start> is any character in the Unicode General Category
classes “Lu”, “Ll”, “Lt”, “Lm”, “Lo”, or “Nl”.
NOTE 112 — The Unicode General Category classes “Lu”, “Ll”, “Lt”, “Lm”,
“Lo”, and “Nl” are assigned to Unicode characters that are, respectively,
upper-case letters, lower-case letters, title-case letters, modifier
letters, other letters, and letter numbers.
2) An <identifier extend> is U+00B7, “Middle Dot”, or any character in the
Unicode General Category classes “Mn”, “Mc”, “Nd”, or “Pc”.
NOTE 113 — The Unicode General Category classes “Mn”, “Mc”, “Nd”, and
“Pc”, are assigned to Unicode characters that are, respectively,
non-spacing marks, spacing combining marks, decimal numbers, and connector
punctuations.

The class for control characters is "C", so there are allowed nowhere.

https://www.unicode.org/charts/script/

> Having said that, I'm not quite sure where we ought to implement
> the restriction, and it's possible that there are multiple places
> that would need to check.

Yeah, a general ban on control characters for all identifiers is harder
to implement than a restricted ban, because it probably involves the
lexer, and I'm not sure the resulting "syntax error" type of rejections
are going to be nice enough to users. A C-function based rejection
seems more convenient at this stage.

> I concur that the day before feature freeze is not a good time to be
> designing this. Let's defer.

Augh.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"In fact, the basic problem with Perl 5's subroutines is that they're not
crufty enough, so the cruft leaks out into user-defined code instead, by
the Conservation of Cruft Principle." (Larry Wall, Apocalypse 6)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Steve Chavez 2025-04-06 17:37:24 [PATCH] clarify palloc comment on quote_literal_cstr
Previous Message Tom Lane 2025-04-06 17:33:47 Re: FmgrInfo allocation patterns (and PL handling as staged programming)