Re: Emitting JSON to file using COPY TO

From: Joe Conway <mail(at)joeconway(dot)com>
To: Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Davin Shearer <davin(at)apache(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Emitting JSON to file using COPY TO
Date: 2023-12-07 13:52:39
Message-ID: 0344a366-5176-4131-b7ca-b6b74857b841@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On 12/7/23 08:35, Daniel Verite wrote:
> Joe Conway wrote:
>
>> The attached should fix the CopyOut response to say one column. I.e. it
>> ought to look something like:
>
> Spending more time with the doc I came to the opinion that in this bit
> of the protocol, in CopyOutResponse (B)
> ...
> Int16
> The number of columns in the data to be copied (denoted N below).
> ...
>
> this number must be the number of columns in the source.
> That is for COPY table(a,b,c) the number is 3, independently
> on whether the result is formatted in text, cvs, json or binary.
>
> I think that changing it for json can reasonably be interpreted
> as a protocol break and we should not do it.
>
> The fact that this value does not help parsing the CopyData
> messages that come next is not a new issue. A reader that
> doesn't know the field separator and whether it's text or csv
> cannot parse these messages into fields anyway.
> But just knowing how much columns there are in the original
> data might be useful by itself and we don't want to break that.

Ok, that sounds reasonable to me -- I will revert that change.

> The other question for me is, in the CopyData message, this
> bit:
> " Messages sent from the backend will always correspond to single data rows"
>
> ISTM that considering that the "[" starting the json array is a
> "data row" is a stretch.
> That might be interpreted as a protocol break, depending
> on how strict the interpretation is.

If we really think that is a problem I can see about changing it to this
format for json array:

8<------------------
copy
(
with ss(f1, f2) as
(
select 1, g.i from generate_series(1, 3) g(i)
)
select ss from ss
) to stdout (format json, force_array);
[{"ss":{"f1":1,"f2":1}}
,{"ss":{"f1":1,"f2":2}}
,{"ss":{"f1":1,"f2":3}}]
8<------------------

Is this acceptable to everyone?

Or maybe this is preferred?
8<------------------
[{"ss":{"f1":1,"f2":1}},
{"ss":{"f1":1,"f2":2}},
{"ss":{"f1":1,"f2":3}}]
8<------------------

Or as long as we are painting the shed, maybe this?
8<------------------
[{"ss":{"f1":1,"f2":1}},
{"ss":{"f1":1,"f2":2}},
{"ss":{"f1":1,"f2":3}}]
8<------------------

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Joe Conway 2023-12-07 13:56:41 Re: Emitting JSON to file using COPY TO
Previous Message David G. Johnston 2023-12-07 13:47:10 Re: Emitting JSON to file using COPY TO

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2023-12-07 13:56:41 Re: Emitting JSON to file using COPY TO
Previous Message Matthias van de Meent 2023-12-07 13:50:46 Re: initdb caching during tests