Re: extensions are hitting the ceiling

From: Eric Hanson <eric(at)aquameta(dot)com>
To: Chapman Flack <chap(at)anastigmatix(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: extensions are hitting the ceiling
Date: 2019-03-19 17:36:59
Message-ID: CACA6kxjUMciJN_Y95xR6XWTLrJeiciOL8WNxinHLDoi3yNiE_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 18, 2019 at 11:56 PM Chapman Flack <chap(at)anastigmatix(dot)net>
wrote:

> On 03/18/19 22:38, Eric Hanson wrote:
> > rows are really second class citizens: They aren't tracked with
> > pg_catalog.pg_depend, they aren't deleted when the extension is dropped,
> > etc.
>
> This. You have other interests as well, but this is the one I was thinking
> about a few years ago in [1] (starting at "Ok, how numerous would be the
> problems with this:").
>

Cool!

First thoughts, it seems like a sensible way to go given the premise that
extensions are immutable. But -- I'd be a little concerned about the
performance ramifications. Usually there are not jillions of database
objects in a extension, but if they started containing data, there sure
could be jillions of rows. Every row would have to be checked for
existence as part of an extension on every insert or update, no?

Nobody ever chimed in to say how numerous they did or didn't think the
> problems would be. I was actually thinking recently about sitting down
> and trying to write that patch, as no one had exactly stood up to say
> "oh heavens no, don't write that." But my round tuits are all deployed
> elsewhere at the moment.
>

Likewise, if nobody tells me "oh sheeze extensions can already do all this"
I'm going to assume they can't. :-)

I'd still like to discuss the ideas.

Me too!

Ok, I should probably come out and say it: I think the user story of
"There is some kind of packaging system that can contain both schema and
data, and these packages can be installed and removed along with their
dependencies atomically" is fairly obvious and desirable. But getting
there while accepting the premises that are currently baked into extensions
might be a tall order.

Extensions have a middleware-ish aspect to them -- they are immutable and
that immutability is checked and enforced at runtime. That might scale
just fine to a few dozen database objects that only check pg_depends on DDL
operations, but if we introduce record tracking and start sticking sticks
into the wheels of the DML, things could go south really quickly it seems.

I really like a more git-like pattern, where you are free to modify the
working copy of a repository (or in this case an extension), and instead of
being blocked from changing things, the system tells the user what has
changed and how, and gives sensible options for what to do about it. That
way it doesn't incur a performance hit, and the user can do a kind of "git
status" on their extension to show any changes.

How about an extension system whose first principle is that an extension is
made up of rows, period. What about the DDL you ask? Well...

Imagine a system catalog whose sole purpose is to contain database object
definitions like "CREATE VIEW ...", similar to those produced by
pg_catalog.pg_get_viewdef(), pg_catalog.get_functiondef(), etc. Let's call
this catalog `def`. There is exactly one VIEW for every type of database
object in PostgreSQL. def.table, def.role, def.sequence, def.operator,
def.type, etc. Each def.* VIEW contains only two columns, `id` and
`definition`. The `id` column contains a unique identifier for the object,
and the `definition` column contains the SQL statement that will recreate
the object.

So, inside this system catalog is the SQL definition statement of every
database object. In theory, the contents of all the `definition` columns
together would be similar to the contents of pg_dump --schema-only.

Now, imagine all these def.* views had insert triggers, so that on insert,
it actually executes the contents of the `definition` column. In theory,
we could pg_restore the data in the def.* views, and it would recreate all
the database objects. It could shift all that logic out of pg_dump and into
the database.

So using the def.* catalog, we could package both "regular" table data and
system objects via the contents of the def.* catalog views. Packages are a
collection rows, period. Build up from there.

I'm working on a prototype called bundle [1], it still has a ways to go but
it's showing some promise. It is going to require brining into PostgreSQL
the missing pg_get_*def functions, as folks have talked about before [2].

Thanks,
Eric

[1]
https://github.com/aquametalabs/aquameta/tree/master/src/pg-extension/bundle
[2]
https://www.postgresql.org/message-id/20130429234634.GA10380@tornado.leadboat.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2019-03-19 17:37:08 Re: Concurrency bug with vacuum full (cluster) and toast
Previous Message David Fetter 2019-03-19 17:28:08 Re: Willing to fix a PQexec() in libpq module