Re: Emitting JSON to file using COPY TO

From: Junwang Zhao <zhjwpku(at)gmail(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Joe Conway <mail(at)joeconway(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Davin Shearer <davin(at)apache(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Emitting JSON to file using COPY TO
Date: 2025-03-02 05:28:07
Message-ID: CAEG8a3J=Wb0dv4uwhhMXQBNPfZ63_ZaSdko8-chK-tc-wSAwog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Mon, Jan 27, 2025 at 4:17 PM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
>
> hi.
>
> There are two ways we can use to represent the new copy format: json.
> 1.
> typedef struct CopyFormatOptions
> {
> bool binary; /* binary format? */
> bool freeze; /* freeze rows on loading? */
> bool csv_mode; /* Comma Separated Value format? */
> bool json_mode; /* JSON format? */
> ...
> }
>
> 2.
> typedef struct CopyFormatOptions
> {
> CopyFormat format; /* format of the COPY operation */
> .....
> }
>
> typedef enum CopyFormat
> {
> COPY_FORMAT_TEXT = 0,
> COPY_FORMAT_BINARY,
> COPY_FORMAT_CSV,
> COPY_FORMAT_JSON,
> } CopyFormat;
>
> both the sizeof(cstate->opts) (CopyToStateData.CopyFormatOptions) is 184.
> so the struct size will not influence the performance.
>
> I also did some benchmarks when using CopyFormat.
> the following are the benchmarks info:
> -------------------------------------------------------------------------------------------------------
> create unlogged table t as select g from generate_series(1, 1_000_000) g;
>
> build_type=release patch:
> copy t to '/dev/null' json \watch i=0.1 c=10
> last execution Time: 108.741 ms
>
> copy t to '/dev/null' (format text) \watch i=0.1 c=10
> last execution Time: 42.600 ms
>
> build_type=release master:
> copy t to '/dev/null' (format text) \watch i=0.1 c=10
> last execution Time Time: 42.948 ms
>
> ---------------------------------------------------------
> so a new version is attached, using the struct CopyFormatOptions.
>
> changes mainly in CopyOneRowTo.
> now it is:
> """"
> if(!cstate->rel)
> {
> memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
> TupleDescAttr(cstate->queryDesc->tupDesc, 0),
> cstate->queryDesc->tupDesc->natts *
> sizeof(FormData_pg_attribute));
> for (int i = 0; i < cstate->queryDesc->tupDesc->natts; i++)
> populate_compact_attribute(slot->tts_tupleDescriptor, i);
> BlessTupleDesc(slot->tts_tupleDescriptor);
> }
> """"
> reasoning for change:
> for composite_to_json to construct json key, we only need
> FormData_pg_attribute.attname
> but code path
> composite_to_json->fastgetattr->TupleDescCompactAttr->verify_compact_attribute
> means we also need to call populate_compact_attribute to populate
> other attributes.
>
> v14-0001-Introduce-CopyFormat-and-replace-csv_mode-and-bi.patch,
> author is by Joel Jacobson.
> As I mentioned in above,
> replacing 3 bool fields by an enum didn't change the struct CopyFormatOptions.
> but consolidated 3 bool fields into one enum to make code more lean.
> I think the refactoring (v14-0001) is worth it.

I've refactored the patch to adapt the newly introduced CopyToRoutine struct,
see 2e4127b6d2.

v15-0001 is the merged one of v14-0001 and v14-0002

There are some other ongoing *copy to/from* refactors[1] which we can benefit
to make the code cleaner, especially the checks done in ProcessCopyOptions.

[1]: https://www.postgresql.org/message-id/20250301.115009.424844407736647598.kou%40clear-code.com

--
Regards
Junwang Zhao

Attachment Content-Type Size
v15-0002-Add-option-force_array-for-COPY-JSON-FORMAT.patch application/octet-stream 10.2 KB
v15-0001-Introduce-json-format-for-COPY-TO.patch application/octet-stream 28.6 KB

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Krishnakant Mane 2025-03-02 06:35:21 Re: create_immv issue on aws Ubuntu even after create extention
Previous Message Adrian Klaver 2025-03-02 04:23:42 Re: create_immv issue on aws Ubuntu even after create extention

Browse pgsql-hackers by date

  From Date Subject
Next Message Gurjeet Singh 2025-03-02 06:09:47 lwlocknames.h beautification attempt
Previous Message Corey Huinker 2025-03-02 01:55:51 Re: Statistics Import and Export