Re: Emitting JSON to file using COPY TO

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Davin Shearer <davin(at)apache(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Emitting JSON to file using COPY TO
Date: 2024-09-13 14:42:00
Message-ID: CACJufxH=3PreKDHuoskn0GQmxUcw-n0k3ogN1CD4CvFbLBy-VA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hi.

in ExecutePlan
we have:

for (;;)
{
ResetPerTupleExprContext(estate);
slot = ExecProcNode(planstate);
if (!TupIsNull(slot))
{
if((slot != NULL) && (slot->tts_tupleDescriptor != NULL)
&& (slot->tts_tupleDescriptor->natts > 0)
&& (slot->tts_tupleDescriptor->attrs->attname.data[0] == '\0'))
elog(INFO, "%s:%d %s this slot first attribute attname is
null", __FILE_NAME__, __LINE__, __func__);
}
if (TupIsNull(slot))
break;
if (sendTuples)
{
if (!dest->receiveSlot(slot, dest))
break;
}

dest->receiveSlot(slot, dest) is responsible for sending values to destination,
for COPY TO it will call copy_dest_receive, CopyOneRowTo.

For the copy to format json, we need to make sure
in "dest->receiveSlot(slot, dest))", the slot->tts_tupleDescriptor has
proper information.
because we *use* slot->tts_tupleDescriptor->attrs->attname as the json key.

For example, if (slot->tts_tupleDescriptor->attrs->attname.data[0] == '\0')
then output json may look like: {"":12}
which is not what we want.

in ExecutePlan i use
elog(INFO, "%s:%d %s this slot first attribute attname is null",
__FILE_NAME__, __LINE__, __func__);
to find sql queries that attribute name is not good.

based on that, i found out many COPY TO (FORMAT JSON) queries will either
error out or the output json key be empty string
if in CopyOneRowTo we didn't copy the cstate->queryDesc->tupDesc
to the slot->tts_tupleDescriptor

You can test it yourself.
first `git am v12-0001-introduce-json-format-for-COPY-TO.patch`
after that, comment out the memcpy call in CopyOneRowTo, just like the
following:
if(!cstate->rel)
{
// memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
// TupleDescAttr(cstate->queryDesc->tupDesc, 0),
// cstate->queryDesc->tupDesc->natts *
sizeof(FormData_pg_attribute));

build and test with the attached script.
you will see COPY TO FORMAT JSON, lots of cases where the json key
becomes an empty string.

I think this thread related issues has been resolved.

Attachment Content-Type Size
scratch31.sql application/sql 1.5 KB

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2024-09-13 14:50:56 Re: Manual query vs trigger during data load
Previous Message François SIMON 2024-09-13 14:15:59 Re: post-bootstrap init : permission denied pg_description

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-09-13 14:56:19 Re: Add system column support to the USING clause
Previous Message Peter Eisentraut 2024-09-13 14:39:56 Re: tiny step toward threading: reduce dependence on setlocale()