Bootstrap DATA is a pita

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Bootstrap DATA is a pita
Date: 2015-02-20 23:41:42
Message-ID: 20150220234142.GH12653@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I've been for a long while been rather annoyed about how cumbersome it
is to add catalog rows using the bootstrap format. Especially pg_proc.h,
pg_operator.h, pg_amop.h, pg_amproc.h and some more are really unwieldy.

I think this needs to be improved. And while I'm not going to start
working on it tonight, I do plan to work on it if we can agree on a
design that I think is worth implementing.

The things that bug me most are:

1) When adding new rows it's rather hard to kno which columns are which,
and you have to specify a lot you really don't care about. Especially
in pg_proc that's rather annoying.

2) Having to assign oids for many things that don't actually need is
bothersome and greatly increases the likelihood of conflicts. There's
some rows for which we need fixed oids (pg_type ones for example),
but e.g. for the majority of pg_proc it's unnecessary.

3) Adding a new column to a system catalog, especially pg_proc.h,
basically requires writing a complex regex or program to modify the
header.

Therefore I propose that we add another format to generate the .bki
insert lines.

What I think we should do is to add pg_<catalog>.data files that contain
the actual data that are automatically parsed by Catalog.pm. Those
contain the rows in some to-be-decided format. I was considering using
json, but it turns out only perl 5.14 started shipping JSON::PP as part
of the standard library. So I guess it's best we just make it a big perl
array + hashes.

To address 1) we just need to make each row a hash and allow leaving out
columns that have some default value.

2) is a bit more complex. Generally many rows don't need a fixed oid at
all and many others primarily need it to handle object descriptions. The
latter seems best best solved by not making it dependant on the oid
anymore.

3) Seems primarily solved by not requiring default values to be
specified anymore. Also it should be much easier to add new values
automatically to a parseable format.

I think we'll need to generate oid #defines for some catalog contents,
but that seems solveable.

Maybe something rougly like:

# pg_type.data
CatalogData(
'pg_type',
[
{
oid => 2249,
data => {typname => 'cstring', typlen => -2, typbyval => 1, fake => '...'},
oiddefine => 'CSTRINGOID'
}
]
);

# pg_proc.data
CatalogData(
'pg_proc',
[
{
oid => 1242,
data => {proname => 'boolin', proretttype => 16, proargtypes => [2275], provolatile => 'i'},
description => 'I/O',
},
{
data => {proname => 'mode_final', proretttype => 2283, proargtypes => [2281, 2283]},
description => 'aggregate final function',
}
]
);

There'd need to be some logic to assign default values for columns, and
maybe even simple logic e.g. to determine arguments like pronargs based
on proargtypes.

This is far from fully though through, but I think something very
roughly along these lines could be a remarkable improvement in the ease
of adding new catalog contents.

Comments?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-02-20 23:44:12 Re: INSERT ... ON CONFLICT UPDATE and logical decoding
Previous Message Kevin Grittner 2015-02-20 23:20:54 Re: Idea: GSoC - Query Rewrite with Materialized Views