Re: Emitting JSON to file using COPY TO

From: Joe Conway <mail(at)joeconway(dot)com>
To: Sehrope Sarkuni <sehrope(at)jackdb(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Davin Shearer <davin(at)apache(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Emitting JSON to file using COPY TO
Date: 2023-12-06 22:38:21
Message-ID: 4ad29199-361f-4740-a919-12080ad5ed65@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On 12/6/23 16:42, Sehrope Sarkuni wrote:
> On Wed, Dec 6, 2023 at 4:29 PM Joe Conway <mail(at)joeconway(dot)com
> <mailto:mail(at)joeconway(dot)com>> wrote:
>
> > 1. Outputting a top level JSON object without the additional column
> > keys. IIUC, the top level keys are always the column names. A
> common use
> > case would be a single json/jsonb column that is already formatted
> > exactly as the user would like for output. Rather than enveloping
> it in
> > an object with a dedicated key, it would be nice to be able to
> output it
> > directly. This would allow non-object results to be outputted as
> well
> > (e.g., lines of JSON arrays, numbers, or strings). Due to how
> JSON is
> > structured, I think this would play nice with the JSON lines v.s.
> array
> > concept.
> >
> > COPY (SELECT json_build_object('foo', x) AS i_am_ignored FROM
> > generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON,
> > SOME_OPTION_TO_NOT_ENVELOPE)
> > {"foo":1}
> > {"foo":2}
> > {"foo":3}
>
> Your example does not match what you describe, or do I misunderstand? I
> thought your goal was to eliminate the repeated "foo" from each row...
>
>
> The "foo" in this case is explicit as I'm adding it when building the
> object. What I was trying to show was not adding an additional object
> wrapper / envelope.
>
> So each row is:
>
> {"foo":1}
>
> Rather than:
>
> "{"json_build_object":{"foo":1}}

I am still getting confused ;-)

Let's focus on the current proposed patch with a "minimum required
feature set".

Right now the default behavior is "JSON lines":
8<-------------------------------
COPY (SELECT x.i, 'val' || x.i as v FROM
generate_series(1, 3) x(i))
TO STDOUT WITH (FORMAT JSON);
{"i":1,"v":"val1"}
{"i":2,"v":"val2"}
{"i":3,"v":"val3"}
8<-------------------------------

and the other, non-default option is "JSON array":
8<-------------------------------
COPY (SELECT x.i, 'val' || x.i as v FROM
generate_series(1, 3) x(i))
TO STDOUT WITH (FORMAT JSON, FORCE_ARRAY);
[
{"i":1,"v":"val1"}
,{"i":2,"v":"val2"}
,{"i":3,"v":"val3"}
]
8<-------------------------------

So the questions are:
1. Do those two formats work for the initial implementation?
2. Is the default correct or should it be switched
e.g. rather than specifying FORCE_ARRAY to get an
array, something like FORCE_NO_ARRAY to get JSON lines
and the JSON array is default?

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David G. Johnston 2023-12-06 22:56:22 Re: Emitting JSON to file using COPY TO
Previous Message Daniel Gustafsson 2023-12-06 22:22:46 Re: Trainning and Certification

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2023-12-06 22:56:22 Re: Emitting JSON to file using COPY TO
Previous Message Sehrope Sarkuni 2023-12-06 21:42:11 Re: Emitting JSON to file using COPY TO