Re: Make COPY format extendable: Extract COPY TO format implementations

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Sutou Kouhei <kou(at)clear-code(dot)com>, "tgl(at)sss(dot)pgh(dot)pa(dot)us" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "zhjwpku(at)gmail(dot)com" <zhjwpku(at)gmail(dot)com>, "michael(at)paquier(dot)xyz" <michael(at)paquier(dot)xyz>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2025-03-22 05:23:56
Message-ID: CAKFQuwY9ZNN-vgW8JnL=GV7UFVPWFzqeqVqEQN81NXQF+pAb7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, March 21, 2025, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

> On Fri, Mar 21, 2025 at 5:32 PM David G. Johnston
> <david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
> >
> > On Tue, Mar 18, 2025 at 7:56 PM Sutou Kouhei <kou(at)clear-code(dot)com> wrote:
> >>
> >> Hi,
> >>
> >> In <CAD21AoDU=bYRDDY8MzCXAfg4h9XTeTBdM-wVJaO1t4UcseCpuA(at)mail(dot)gmail(dot)com>
> >> "Re: Make COPY format extendable: Extract COPY TO format
> implementations" on Mon, 17 Mar 2025 13:50:03 -0700,
> >> Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >>
> >> > I think that built-in formats also need to have their handler
> >> > functions. This seems to be a conventional way for customizable
> >> > features such as tablesample and access methods, and we can simplify
> >> > this function.
> >>
> >> OK. 0008 in the attached v37 patch set does it.
> >>
> >
> > tl/dr;
> >
> > We need to exclude from our SQL function search any function that
> doesn't declare copy_handler as its return type.
> > ("function text must return type copy_handler" is not an acceptable
> error message)
> >
> > We need to accept identifiers in FORMAT and parse the optional catalog,
> schema, and object name portions.
> > (Restrict our function search to the named schema if provided.)
> >
> > Detail:
> >
> > Fun thing...(not sure how much of this is covered above: I do see, but
> didn't scour, the security discussion):
> >
> > -- create some poison
> > create function public.text(internal) returns boolean language c as
> '/home/davidj/gotya/gotya', 'gotit';
> > CREATE FUNCTION
> >
> > -- inject it
> > postgres=# set search_path to public,pg_catalog;
> > SET
> >
> > -- watch it die
> > postgres=# copy (select 1) to stdout (format text);
> > ERROR: function text must return type copy_handler
> > LINE 1: copy (select 1) to stdout (format text);
> >
> > I'm especially concerned about extensions here.
> >
> > We shouldn't be locating any SQL function that doesn't have a
> copy_handler return type. Unfortunately, LookupFuncName seems incapable of
> doing what we want here. I suggest we create a new lookup routine where we
> can specify the return argument type as a required element. That would
> cleanly mitigate the denial-of-service attack/accident vector demonstrated
> above (the text returning function should have zero impact on how this
> feature behaves). If someone does create a handler SQL function without
> using copy_handler return type we'd end up showing "COPY format 'david' not
> recognized" - a developer should be able to figure out they didn't put a
> correct return type on their handler function and that is why the system
> did not register it.
>
> Just to be clear, the patch checks the function's return type before
> calling it:
>
> funcargtypes[0] = INTERNALOID;
> handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
>
> funcargtypes, true);
> if (!OidIsValid(handlerOid))
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> errmsg("COPY format \"%s\" not
> recognized", format),
> parser_errposition(pstate,
> defel->location)));
>
> /* check that handler has correct return type */
> if (get_func_rettype(handlerOid) != COPY_HANDLEROID)
> ereport(ERROR,
> (errcode(ERRCODE_WRONG_OBJECT_TYPE),
> errmsg("function %s must return type %s",
> format, "copy_handler"),
> parser_errposition(pstate,
> defel->location)));
>
> So would changing the error message to like "COPY format 'text' not
> recognized" untangle your concern?

In my example above copy should not fail at all. The text function created
in public that returns Boolean would never be seen and the real one in
pg_catalog would then be found and behave as expected.

>
> FYI the same is true for TABLESAMPLE; it invokes a function with the
> specified method name and checks the returned Node type:
>
> =# select * from pg_class tablesample text (0);
> ERROR: function text must return type tsm_handler

Then this would benefit from the new function I suggest creating since it
apparently has the same, IMO, bug.

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-03-22 05:48:41 Re: Proposal - Allow extensions to set a Plan Identifier
Previous Message Masahiko Sawada 2025-03-22 05:07:50 Re: Make COPY format extendable: Extract COPY TO format implementations