From: | jian he <jian(dot)universality(at)gmail(dot)com> |
---|---|
To: | Joe Conway <mail(at)joeconway(dot)com> |
Cc: | "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Davin Shearer <davin(at)apache(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Emitting JSON to file using COPY TO |
Date: | 2025-01-27 08:16:26 |
Message-ID: | CACJufxG_VY-Hv6ss8pqDxP8SaBmBPYUdv=LB3QFpsqtTZu9FYQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
hi.
There are two ways we can use to represent the new copy format: json.
1.
typedef struct CopyFormatOptions
{
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
bool csv_mode; /* Comma Separated Value format? */
bool json_mode; /* JSON format? */
...
}
2.
typedef struct CopyFormatOptions
{
CopyFormat format; /* format of the COPY operation */
.....
}
typedef enum CopyFormat
{
COPY_FORMAT_TEXT = 0,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
COPY_FORMAT_JSON,
} CopyFormat;
both the sizeof(cstate->opts) (CopyToStateData.CopyFormatOptions) is 184.
so the struct size will not influence the performance.
I also did some benchmarks when using CopyFormat.
the following are the benchmarks info:
-------------------------------------------------------------------------------------------------------
create unlogged table t as select g from generate_series(1, 1_000_000) g;
build_type=release patch:
copy t to '/dev/null' json \watch i=0.1 c=10
last execution Time: 108.741 ms
copy t to '/dev/null' (format text) \watch i=0.1 c=10
last execution Time: 42.600 ms
build_type=release master:
copy t to '/dev/null' (format text) \watch i=0.1 c=10
last execution Time Time: 42.948 ms
---------------------------------------------------------
so a new version is attached, using the struct CopyFormatOptions.
changes mainly in CopyOneRowTo.
now it is:
""""
if(!cstate->rel)
{
memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
TupleDescAttr(cstate->queryDesc->tupDesc, 0),
cstate->queryDesc->tupDesc->natts *
sizeof(FormData_pg_attribute));
for (int i = 0; i < cstate->queryDesc->tupDesc->natts; i++)
populate_compact_attribute(slot->tts_tupleDescriptor, i);
BlessTupleDesc(slot->tts_tupleDescriptor);
}
""""
reasoning for change:
for composite_to_json to construct json key, we only need
FormData_pg_attribute.attname
but code path
composite_to_json->fastgetattr->TupleDescCompactAttr->verify_compact_attribute
means we also need to call populate_compact_attribute to populate
other attributes.
v14-0001-Introduce-CopyFormat-and-replace-csv_mode-and-bi.patch,
author is by Joel Jacobson.
As I mentioned in above,
replacing 3 bool fields by an enum didn't change the struct CopyFormatOptions.
but consolidated 3 bool fields into one enum to make code more lean.
I think the refactoring (v14-0001) is worth it.
Attachment | Content-Type | Size |
---|---|---|
v14-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patch | text/x-patch | 8.9 KB |
v14-0002-introduce-json-format-for-COPY-TO.patch | text/x-patch | 17.5 KB |
v14-0001-Introduce-CopyFormat-and-replace-csv_mode-and-bi.patch | text/x-patch | 18.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2025-01-27 09:55:34 | Re: Disabling vacuum truncate for autovacuum |
Previous Message | Tom Lane | 2025-01-26 19:04:41 | Re: Using Expanded Objects other than Arrays from plpgsql |
From | Date | Subject | |
---|---|---|---|
Next Message | Andrei Lepikhov | 2025-01-27 08:52:16 | Re: POC, WIP: OR-clause support for indexes |
Previous Message | Richard Guo | 2025-01-27 08:05:50 | Adjust tuples estimate for appendrels |