Re: [PATCH v1] Fix parsing of a complex type that has an array of complex types

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Arjan Marku <arjanmarku02(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH v1] Fix parsing of a complex type that has an array of complex types
Date: 2024-07-15 19:15:44
Message-ID: 3190309.1721070944@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Arjan Marku <arjanmarku02(at)gmail(dot)com> writes:
> INSERT INTO item_2d_table VALUES(1, '(1,{{("inv a",42,1.99),("inv
> b",42,1.99)},{("inv c",42,1.99),("inv d",42,2)}})');

> The INSERT statement will fail due to how complex types are parsed, I
> have included a patch in this email to support this scenario.

The actual problem with this input is that it's inadequately quoted.
The record fields need to be quoted according to the rules in

https://www.postgresql.org/docs/current/rowtypes.html#ROWTYPES-IO-SYNTAX

and then in the field that is an array, the array elements need to be
quoted according to the rules in

https://www.postgresql.org/docs/current/arrays.html#ARRAYS-IO

and then the innermost record fields need yet another level of quoting
(well, you might be able to skip that given that there are no special
characters in this particular data, but in general you'd have to).

It looks to me like your patch is trying to make array_in and
record_in sufficiently aware of each other's rules that the outer
quoting could be omitted, but I believe that that's a seriously bad
idea. In the first place, it's far from clear that that's even
possible without ambiguity (much less that this specific patch does
it correctly). In the second place, this will make it even more
difficult for these functions to issue on-point error messages for
incorrect input. (They're already struggling with that, see
e.g. [1].) And in the third place, these functions are nearly
unmaintainable spaghetti already. (There have also been complaints
that they're too slow, which this won't improve.) We don't need
another big dollop of complexity here.

Our normal recommendation for handwritten input is to not try to deal
with the complications of correctly quoting nested array/record data.
Instead use row() and array[] constructors. So you could write
something like

INSERT INTO item_2d_table
VALUES(1,
row(1, array[[row('inv a',42,1.99), row('inv b',42,1.99)],
[row('inv c',42,1.99), row('inv d',42,2)]]::inventory_item[]));

In this particular example we need an explicit cast to cue the
parser about the type of the array elements, but then it can
cope with casting the outer row() construct automatically.

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/CACJufxExAcpvrkbLGrZGdZ%3DbFAuj7OVp1mOhk%2BfsBzeUbOGuHQ%40mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nitin Motiani 2024-07-15 19:18:19 Re: long-standing data loss bug in initial sync of logical replication
Previous Message Nathan Bossart 2024-07-15 19:14:06 Re: Add a GUC check hook to ensure summarize_wal cannot be enabled when wal_level is minimal