From: | "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> |
---|---|
To: | Joe Conway <mail(at)joeconway(dot)com> |
Cc: | Daniel Verite <daniel(at)manitou-mail(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Davin Shearer <davin(at)apache(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Emitting JSON to file using COPY TO |
Date: | 2023-12-07 14:11:08 |
Message-ID: | CAKFQuwZ2e6QanxpoUarMva0mp1UJvT9bOJRwhkZ8w_E=K-s8ww@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
On Thursday, December 7, 2023, Joe Conway <mail(at)joeconway(dot)com> wrote:
> On 12/7/23 08:35, Daniel Verite wrote:
>
>> Joe Conway wrote:
>>
>> The attached should fix the CopyOut response to say one column. I.e. it
>>> ought to look something like:
>>>
>>
>> Spending more time with the doc I came to the opinion that in this bit
>> of the protocol, in CopyOutResponse (B)
>> ...
>> Int16
>> The number of columns in the data to be copied (denoted N below).
>> ...
>>
>> this number must be the number of columns in the source.
>> That is for COPY table(a,b,c) the number is 3, independently
>> on whether the result is formatted in text, cvs, json or binary.
>>
>> I think that changing it for json can reasonably be interpreted
>> as a protocol break and we should not do it.
>>
>> The fact that this value does not help parsing the CopyData
>> messages that come next is not a new issue. A reader that
>> doesn't know the field separator and whether it's text or csv
>> cannot parse these messages into fields anyway.
>> But just knowing how much columns there are in the original
>> data might be useful by itself and we don't want to break that.
>>
>
> Ok, that sounds reasonable to me -- I will revert that change.
>
> The other question for me is, in the CopyData message, this
>> bit:
>> " Messages sent from the backend will always correspond to single data
>> rows"
>>
>> ISTM that considering that the "[" starting the json array is a
>> "data row" is a stretch.
>> That might be interpreted as a protocol break, depending
>> on how strict the interpretation is.
>>
>
> If we really think that is a problem I can see about changing it to this
> format for json array:
>
> 8<------------------
> copy
> (
> with ss(f1, f2) as
> (
> select 1, g.i from generate_series(1, 3) g(i)
> )
> select ss from ss
> ) to stdout (format json, force_array);
> [{"ss":{"f1":1,"f2":1}}
> ,{"ss":{"f1":1,"f2":2}}
> ,{"ss":{"f1":1,"f2":3}}]
> 8<------------------
>
> Is this acceptable to everyone?
>
> Or maybe this is preferred?
> 8<------------------
> [{"ss":{"f1":1,"f2":1}},
> {"ss":{"f1":1,"f2":2}},
> {"ss":{"f1":1,"f2":3}}]
> 8<------------------
>
> Or as long as we are painting the shed, maybe this?
> 8<------------------
> [{"ss":{"f1":1,"f2":1}},
> {"ss":{"f1":1,"f2":2}},
> {"ss":{"f1":1,"f2":3}}]
> 8<------------------
>
Those are all the same breakage though - if truly interpreted as data rows
the protocol is basically written such that the array format is not
supportable and only the lines format can be used. Hence my “format 0
doesn’t work” comment for array output and we should explicitly add format
2 where we explicitly decouple lines of output from rows of data. That
said, it would seem in practice format 0 already decouples them and so the
current choice of the brackets on their own lines is acceptable.
I’d prefer to keep them on their own line.
I also don’t know why you introduced another level of object nesting here.
That seems quite undesirable.
David J.
From | Date | Subject | |
---|---|---|---|
Next Message | Joe Conway | 2023-12-07 15:07:59 | Re: Emitting JSON to file using COPY TO |
Previous Message | Tom Lane | 2023-12-07 14:09:05 | Re: Postgresql : pg_temp & pg_toast_temp |
From | Date | Subject | |
---|---|---|---|
Next Message | Aleksander Alekseev | 2023-12-07 14:16:59 | Re: Proposal to add page headers to SLRU pages |
Previous Message | Daniel Gustafsson | 2023-12-07 14:06:41 | Re: initdb caching during tests |