Re: TABLESAMPLE patch is really in pretty sad shape

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: TABLESAMPLE patch is really in pretty sad shape
Date: 2015-07-12 22:36:51
Message-ID: 5438.1436740611@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> The two contrib modules this patch added are nowhere near fit for public
> consumption. They cannot clean up after themselves when dropped:
> ...
> Raw inserts into system catalogs just
> aren't a sane thing to do in extensions.

I had some thoughts about how we might fix that, without going to the
rather tedious lengths of creating a complete set of DDL infrastructure
for CREATE/DROP TABLESAMPLE METHOD.

First off, the extension API designed for the tablesample patch is
evidently modeled on the index AM API, which was not a particularly good
precedent --- it's not a coincidence that index AMs can't be added or
dropped on-the-fly. Modeling a server internal API as a set of
SQL-visible functions is problematic when the call signatures of those
functions can't be accurately described by SQL datatypes, and it's rather
pointless and inefficient when none of the functions in question is meant
to be SQL-callable. It's even less attractive if you don't think you've
got a completely stable API spec, because adding, dropping, or changing
signature of any one of the API functions then involves a pile of
easy-to-mess-up catalog changes on top of the actually useful work.
Not to mention then having to think about backwards compatibility of
your CREATE command's arguments.

We have a far better model to follow already, namely the foreign data
wrapper API. In that, there's a single SQL-visible function that returns
a dummy datatype indicating that it's an FDW handler, and when called,
it hands back a struct containing pointers to all the other functions
that the particular wrapper needs to supply (and, if necessary, the struct
could have non-function-pointer fields containing other info the core
system might need to know about the handler). We could similarly invent a
pseudotype "tablesample_handler" and represent each tablesample method by
a single SQL function returning tablesample_handler. Any future changes
in the API for tablesample handlers would then appear as changes in the C
definition of the struct returned by the handler, which requires no
SQL-visible thrashing, hence creates no headaches for pg_upgrade, and it
is pretty easy to design it so that failure to update an external module
that implements the API results in C compiler errors and/or sane fallback
behavior.

Once we've done that, I think we don't even need a special catalog
representing tablesample methods. Given "FROM foo TABLESAMPLE
bernoulli(...)", the parser could just look for a function bernoulli()
returning tablesample_handler, and it's done. The querytree would have an
ordinary function dependency on that function, which would be sufficient
to handle DROP dependency behaviors properly. (On reflection, maybe
better if it's "bernoulli(internal) returns tablesample_handler",
so as to guarantee that such a function doesn't create a conflict with
any user-defined function of the same name.)

Thoughts?

regards, tom lane

PS: now that I've written this rant, I wonder why we don't redesign the
index AM API along the same lines. It probably doesn't matter much at
the moment, but if we ever get serious about supporting index AM
extensions, I think we ought to consider doing that.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-07-13 01:36:44 Re: Fixes for a couple of resource leaks
Previous Message David Fetter 2015-07-12 22:26:41 Re: creating extension including dependencies