Re: Emitting JSON to file using COPY TO

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Davin Shearer <davin(at)apache(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Emitting JSON to file using COPY TO
Date: 2023-12-07 14:11:08
Message-ID: CAKFQuwZ2e6QanxpoUarMva0mp1UJvT9bOJRwhkZ8w_E=K-s8ww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Thursday, December 7, 2023, Joe Conway <mail(at)joeconway(dot)com> wrote:

> On 12/7/23 08:35, Daniel Verite wrote:
>
>> Joe Conway wrote:
>>
>> The attached should fix the CopyOut response to say one column. I.e. it
>>> ought to look something like:
>>>
>>
>> Spending more time with the doc I came to the opinion that in this bit
>> of the protocol, in CopyOutResponse (B)
>> ...
>> Int16
>> The number of columns in the data to be copied (denoted N below).
>> ...
>>
>> this number must be the number of columns in the source.
>> That is for COPY table(a,b,c) the number is 3, independently
>> on whether the result is formatted in text, cvs, json or binary.
>>
>> I think that changing it for json can reasonably be interpreted
>> as a protocol break and we should not do it.
>>
>> The fact that this value does not help parsing the CopyData
>> messages that come next is not a new issue. A reader that
>> doesn't know the field separator and whether it's text or csv
>> cannot parse these messages into fields anyway.
>> But just knowing how much columns there are in the original
>> data might be useful by itself and we don't want to break that.
>>
>
> Ok, that sounds reasonable to me -- I will revert that change.
>
> The other question for me is, in the CopyData message, this
>> bit:
>> " Messages sent from the backend will always correspond to single data
>> rows"
>>
>> ISTM that considering that the "[" starting the json array is a
>> "data row" is a stretch.
>> That might be interpreted as a protocol break, depending
>> on how strict the interpretation is.
>>
>
> If we really think that is a problem I can see about changing it to this
> format for json array:
>
> 8<------------------
> copy
> (
> with ss(f1, f2) as
> (
> select 1, g.i from generate_series(1, 3) g(i)
> )
> select ss from ss
> ) to stdout (format json, force_array);
> [{"ss":{"f1":1,"f2":1}}
> ,{"ss":{"f1":1,"f2":2}}
> ,{"ss":{"f1":1,"f2":3}}]
> 8<------------------
>
> Is this acceptable to everyone?
>
> Or maybe this is preferred?
> 8<------------------
> [{"ss":{"f1":1,"f2":1}},
> {"ss":{"f1":1,"f2":2}},
> {"ss":{"f1":1,"f2":3}}]
> 8<------------------
>
> Or as long as we are painting the shed, maybe this?
> 8<------------------
> [{"ss":{"f1":1,"f2":1}},
> {"ss":{"f1":1,"f2":2}},
> {"ss":{"f1":1,"f2":3}}]
> 8<------------------
>

Those are all the same breakage though - if truly interpreted as data rows
the protocol is basically written such that the array format is not
supportable and only the lines format can be used. Hence my “format 0
doesn’t work” comment for array output and we should explicitly add format
2 where we explicitly decouple lines of output from rows of data. That
said, it would seem in practice format 0 already decouples them and so the
current choice of the brackets on their own lines is acceptable.

I’d prefer to keep them on their own line.

I also don’t know why you introduced another level of object nesting here.
That seems quite undesirable.

David J.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Joe Conway 2023-12-07 15:07:59 Re: Emitting JSON to file using COPY TO
Previous Message Tom Lane 2023-12-07 14:09:05 Re: Postgresql : pg_temp & pg_toast_temp

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2023-12-07 14:16:59 Re: Proposal to add page headers to SLRU pages
Previous Message Daniel Gustafsson 2023-12-07 14:06:41 Re: initdb caching during tests