Re: Make COPY format extendable: Extract COPY TO format implementations

From: Sutou Kouhei <kou(at)clear-code(dot)com>
To: sawada(dot)mshk(at)gmail(dot)com
Cc: michael(at)paquier(dot)xyz, zhjwpku(at)gmail(dot)com, andrew(at)dunslane(dot)net, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2023-12-15 00:53:05
Message-ID: 20231215.095305.1361997086905276509.kou@clear-code.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In <CAD21AoCZv3cVU+NxR2s9J_dWvjrS350GFFr2vMgCH8wWxQ5hTQ(at)mail(dot)gmail(dot)com>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 15 Dec 2023 05:19:43 +0900,
Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

> To avoid collisions, extensions can be created in a
> different schema than public.

Thanks. I didn't notice it.

> And note that built-in format copy handler doesn't need to
> declare its handler function.

Right. I know it.

> Adding a prefix or suffix would be one option but to give extensions
> more flexibility, another option would be to support format = 'custom'
> and add the "handler" option to specify a copy handler function name
> to call. For example, COPY ... FROM ... WITH (FORMAT = 'custom',
> HANDLER = 'arrow_copy_handler').

Interesting. If we use this option, users can choose an COPY
FORMAT implementation they like from multiple
implementations. For example, a developer may implement a
COPY FROM FORMAT = 'json' handler with PostgreSQL's JSON
related API and another developer may implement a handler
with simdjson[1] which is a fast JSON parser. Users can
choose whichever they like.

But specifying HANDLER = '...' explicitly is a bit
inconvenient. Because only one handler will be installed in
most use cases. In the case, users don't need to choose one
handler.

If we choose this option, it may be better that we also
provide a mechanism that can work without HANDLER. Searching
a function by name like tablesample method does is an option.

[1]: https://github.com/simdjson/simdjson

Thanks,
--
kou

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2023-12-15 01:20:12 Re: Change GUC hashtable to use simplehash?
Previous Message Ashutosh Bapat 2023-12-14 23:52:54 Re: Memory consumed by paths during partitionwise join planning