Re: Make COPY format extendable: Extract COPY TO format implementations

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Sutou Kouhei <kou(at)clear-code(dot)com>, tgl(at)sss(dot)pgh(dot)pa(dot)us, zhjwpku(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2025-03-22 05:07:50
Message-ID: CAD21AoCjK=NC62Z+VUip5-rc_rZz-6h9Dqacxo5G5PdK4kKKwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 21, 2025 at 5:32 PM David G. Johnston
<david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
>
> On Tue, Mar 18, 2025 at 7:56 PM Sutou Kouhei <kou(at)clear-code(dot)com> wrote:
>>
>> Hi,
>>
>> In <CAD21AoDU=bYRDDY8MzCXAfg4h9XTeTBdM-wVJaO1t4UcseCpuA(at)mail(dot)gmail(dot)com>
>> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 17 Mar 2025 13:50:03 -0700,
>> Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>
>> > I think that built-in formats also need to have their handler
>> > functions. This seems to be a conventional way for customizable
>> > features such as tablesample and access methods, and we can simplify
>> > this function.
>>
>> OK. 0008 in the attached v37 patch set does it.
>>
>
> tl/dr;
>
> We need to exclude from our SQL function search any function that doesn't declare copy_handler as its return type.
> ("function text must return type copy_handler" is not an acceptable error message)
>
> We need to accept identifiers in FORMAT and parse the optional catalog, schema, and object name portions.
> (Restrict our function search to the named schema if provided.)
>
> Detail:
>
> Fun thing...(not sure how much of this is covered above: I do see, but didn't scour, the security discussion):
>
> -- create some poison
> create function public.text(internal) returns boolean language c as '/home/davidj/gotya/gotya', 'gotit';
> CREATE FUNCTION
>
> -- inject it
> postgres=# set search_path to public,pg_catalog;
> SET
>
> -- watch it die
> postgres=# copy (select 1) to stdout (format text);
> ERROR: function text must return type copy_handler
> LINE 1: copy (select 1) to stdout (format text);
>
> I'm especially concerned about extensions here.
>
> We shouldn't be locating any SQL function that doesn't have a copy_handler return type. Unfortunately, LookupFuncName seems incapable of doing what we want here. I suggest we create a new lookup routine where we can specify the return argument type as a required element. That would cleanly mitigate the denial-of-service attack/accident vector demonstrated above (the text returning function should have zero impact on how this feature behaves). If someone does create a handler SQL function without using copy_handler return type we'd end up showing "COPY format 'david' not recognized" - a developer should be able to figure out they didn't put a correct return type on their handler function and that is why the system did not register it.

Just to be clear, the patch checks the function's return type before calling it:

funcargtypes[0] = INTERNALOID;
handlerOid = LookupFuncName(list_make1(makeString(format)), 1,

funcargtypes, true);
if (!OidIsValid(handlerOid))
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY format \"%s\" not
recognized", format),
parser_errposition(pstate, defel->location)));

/* check that handler has correct return type */
if (get_func_rettype(handlerOid) != COPY_HANDLEROID)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("function %s must return type %s",
format, "copy_handler"),
parser_errposition(pstate, defel->location)));

So would changing the error message to like "COPY format 'text' not
recognized" untangle your concern?

FYI the same is true for TABLESAMPLE; it invokes a function with the
specified method name and checks the returned Node type:

=# select * from pg_class tablesample text (0);
ERROR: function text must return type tsm_handler

A difference between TABLESAMPLE and COPY format is that the former
accepts a qualified name but the latter doesn't:

=# create extension tsm_system_rows ;
=# create schema s1;
=# create function s1.system_rows(internal) returns void language c as
'tsm_system_rows.so', 'tsm_system_rows_handler';
=# \df *.system_rows
List of functions
Schema | Name | Result data type | Argument data types | Type
--------+-------------+------------------+---------------------+------
public | system_rows | tsm_handler | internal | func
s1 | system_rows | void | internal | func
(2 rows)
postgres(1:1194923)=# select count(*) from pg_class tablesample system_rows(0);
count
-------
0
(1 row)

postgres(1:1194923)=# select count(*) from pg_class tablesample
s1.system_rows(0);
ERROR: function s1.system_rows must return type tsm_handler

> A second concern is simply people wanting to name things the same; or, why namespaces were invented.

Yeah, I think that the custom COPY format should support qualified
names at least.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2025-03-22 05:23:56 Re: Make COPY format extendable: Extract COPY TO format implementations
Previous Message Nathan Bossart 2025-03-22 03:42:06 Re: [PATCH] SVE popcount support