Re: Make COPY format extendable: Extract COPY TO format implementations

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Sutou Kouhei <kou(at)clear-code(dot)com>
Cc: sawada(dot)mshk(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, zhjwpku(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2025-03-22 00:31:54
Message-ID: CAKFQuwbsD9Dh_52BMzGkkfkNiWeP_SVx-rq-=x_2qsbxHwf1-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 18, 2025 at 7:56 PM Sutou Kouhei <kou(at)clear-code(dot)com> wrote:

> Hi,
>
> In <CAD21AoDU=bYRDDY8MzCXAfg4h9XTeTBdM-wVJaO1t4UcseCpuA(at)mail(dot)gmail(dot)com>
> "Re: Make COPY format extendable: Extract COPY TO format
> implementations" on Mon, 17 Mar 2025 13:50:03 -0700,
> Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> > I think that built-in formats also need to have their handler
> > functions. This seems to be a conventional way for customizable
> > features such as tablesample and access methods, and we can simplify
> > this function.
>
> OK. 0008 in the attached v37 patch set does it.
>
>
tl/dr;

We need to exclude from our SQL function search any function that doesn't
declare copy_handler as its return type.
("function text must return type copy_handler" is not an acceptable error
message)

We need to accept identifiers in FORMAT and parse the optional catalog,
schema, and object name portions.
(Restrict our function search to the named schema if provided.)

Detail:

Fun thing...(not sure how much of this is covered above: I do see, but
didn't scour, the security discussion):

-- create some poison
create function public.text(internal) returns boolean language c as
'/home/davidj/gotya/gotya', 'gotit';
CREATE FUNCTION

-- inject it
postgres=# set search_path to public,pg_catalog;
SET

-- watch it die
postgres=# copy (select 1) to stdout (format text);
ERROR: function text must return type copy_handler
LINE 1: copy (select 1) to stdout (format text);

I'm especially concerned about extensions here.

We shouldn't be locating any SQL function that doesn't have a copy_handler
return type. Unfortunately, LookupFuncName seems incapable of doing what
we want here. I suggest we create a new lookup routine where we can
specify the return argument type as a required element. That would cleanly
mitigate the denial-of-service attack/accident vector demonstrated above
(the text returning function should have zero impact on how this feature
behaves). If someone does create a handler SQL function without using
copy_handler return type we'd end up showing "COPY format 'david' not
recognized" - a developer should be able to figure out they didn't put a
correct return type on their handler function and that is why the system
did not register it.

A second concern is simply people wanting to name things the same; or, why
namespaces were invented.

Can we just accept a proper identifier after FORMAT so we can use
schema-qualified names?

(FORMAT "davescopyformat"."david")

We can special case the internal schema-less names and internally force
pg_catalog to avoid them being shadowed.

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2025-03-22 03:42:06 Re: [PATCH] SVE popcount support
Previous Message Rustam ALLAKOV 2025-03-21 23:47:10 Re: Patch: Cover POSITION(bytea,bytea) with tests