Re: Emitting JSON to file using COPY TO

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Davin Shearer <davin(at)apache(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Emitting JSON to file using COPY TO
Date: 2025-01-27 08:16:26
Message-ID: CACJufxG_VY-Hv6ss8pqDxP8SaBmBPYUdv=LB3QFpsqtTZu9FYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

hi.

There are two ways we can use to represent the new copy format: json.
1.
typedef struct CopyFormatOptions
{
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
bool csv_mode; /* Comma Separated Value format? */
bool json_mode; /* JSON format? */
...
}

2.
typedef struct CopyFormatOptions
{
CopyFormat format; /* format of the COPY operation */
.....
}

typedef enum CopyFormat
{
COPY_FORMAT_TEXT = 0,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
COPY_FORMAT_JSON,
} CopyFormat;

both the sizeof(cstate->opts) (CopyToStateData.CopyFormatOptions) is 184.
so the struct size will not influence the performance.

I also did some benchmarks when using CopyFormat.
the following are the benchmarks info:
-------------------------------------------------------------------------------------------------------
create unlogged table t as select g from generate_series(1, 1_000_000) g;

build_type=release patch:
copy t to '/dev/null' json \watch i=0.1 c=10
last execution Time: 108.741 ms

copy t to '/dev/null' (format text) \watch i=0.1 c=10
last execution Time: 42.600 ms

build_type=release master:
copy t to '/dev/null' (format text) \watch i=0.1 c=10
last execution Time Time: 42.948 ms

---------------------------------------------------------
so a new version is attached, using the struct CopyFormatOptions.

changes mainly in CopyOneRowTo.
now it is:
""""
if(!cstate->rel)
{
memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
TupleDescAttr(cstate->queryDesc->tupDesc, 0),
cstate->queryDesc->tupDesc->natts *
sizeof(FormData_pg_attribute));
for (int i = 0; i < cstate->queryDesc->tupDesc->natts; i++)
populate_compact_attribute(slot->tts_tupleDescriptor, i);
BlessTupleDesc(slot->tts_tupleDescriptor);
}
""""
reasoning for change:
for composite_to_json to construct json key, we only need
FormData_pg_attribute.attname
but code path
composite_to_json->fastgetattr->TupleDescCompactAttr->verify_compact_attribute
means we also need to call populate_compact_attribute to populate
other attributes.

v14-0001-Introduce-CopyFormat-and-replace-csv_mode-and-bi.patch,
author is by Joel Jacobson.
As I mentioned in above,
replacing 3 bool fields by an enum didn't change the struct CopyFormatOptions.
but consolidated 3 bool fields into one enum to make code more lean.
I think the refactoring (v14-0001) is worth it.

Attachment Content-Type Size
v14-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patch text/x-patch 8.9 KB
v14-0002-introduce-json-format-for-COPY-TO.patch text/x-patch 17.5 KB
v14-0001-Introduce-CopyFormat-and-replace-csv_mode-and-bi.patch text/x-patch 18.8 KB

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Laurenz Albe 2025-01-27 09:55:34 Re: Disabling vacuum truncate for autovacuum
Previous Message Tom Lane 2025-01-26 19:04:41 Re: Using Expanded Objects other than Arrays from plpgsql

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Lepikhov 2025-01-27 08:52:16 Re: POC, WIP: OR-clause support for indexes
Previous Message Richard Guo 2025-01-27 08:05:50 Adjust tuples estimate for appendrels