Re: Bootstrap DATA is a pita

From: Caleb Welton <cwelton(at)pivotal(dot)io>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bootstrap DATA is a pita
Date: 2015-12-11 19:15:56
Message-ID: CAOjayEfKBL-_Q9m3Jsv6V-mK1q8h=ca5Hm0fecXGxZUhPDN9BA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm happy working these ideas forward if there is interest.

Basic design proposal is:
- keep a minimal amount of bootstrap to avoid intrusive changes to core
components
- Add capabilities of creating objects with specific OIDs via DDL during
initdb
- Update the caching/resolution mechanism for builtin functions to be
more dynamic.
- Move as much of bootstrap as possible into SQL files and create catalog
via DDL

Feedback appreciated.

I can provide a sample patch if there is interest, about ~500 lines of
combined diff for the needed infrastructure to support the above, not
including the modifications to pg_proc.h that would follow.

Thanks,
Caleb

On Thu, Dec 10, 2015 at 11:47 AM, Caleb Welton wrote:
>
>
> Hello Hackers,
>
> Reviving an old thread on simplifying the bootstrap process.
>
> I'm a developer from the GPDB / HAWQ side of the world where we did some
> work a while back to enable catalog definition via SQL files and we have
> found it valuable from a dev perspective. The mechanism currently in those
> products is a bit.. convoluted where SQL is processed in perl to create the
> existing DATA statements, which are then processed as they are today in
> Postgres... I wouldn't suggest this route, but having worked with both the
> DATA mechanism and the SQL based one I've certainly found SQL to be a more
> convenient way of interacting with the catalog.
>
> I'd propose:
> - Keep enough of the existing bootstrap mechanism functional to get a
> small tidy core, essentially you need enough of pg_type, pg_proc, pg_class,
> pg_attribute to support the 25 types used by catalog tables and most
> everything else can be moved into SQL processing like how system_views.sql
> is handled today.
>
> The above was largely proposed back in March and rejected based on
> concerns that
>
> 1. initdb would be slower.
> 2. It would introduce too much special purpose bootstrap cruft into the
> code.
> 3. Editing SQL commands is not comfortable in bulk
>
> On 1.
>
> I have a prototype that handles about 1000 functions (all the functions in
> pg_proc.h that are not used by other catalog tables, e.g. pg_type,
> pg_language, pg_range, pg_aggregate, window functions, pg_ts_parser, etc).
>
> All of initdb can be processed in 1.53s. This compares to 1.37s with the
> current bootstrap approach. So yes, this is slower, but not 'noticeably
> slower' - I certainly didn't notice the 0.16s until I saw the concern and
> then timed it.
>
> On 2.
>
> So far the amount of cruft has been:
> - Enabling adding functions with specific OIDs when creating functions.
> 1 line changes in pg_aggregate.c, proclang.c, typecmds.c
> about dozen lines of code in functioncmds.c
> 3 lines changed in pg_proc.c
> - Update the fmgr_internal_validator for builtin functions while the
> catalog is mutable
> 3 lines changed in pg_proc.c
> - Update how the builtin function cache is built
> Some significant work in fmgr.c that honestly still needs cleanup
> before it would be ready to propose as a patch that would be worthy of
> committing.
> - Update how builtin functions are resolved outside of bootstrap
> Minor updates to dynloader for lookup of symbols within the current
> executable, so far I've only done darwin.c for my prototype, this would
> need to be extended to the other ports.
> - Initializitation of the builtin cache
> 2 line change in postinit.c
> - Addition of a stage in initdb to process the sql directives similar in
> scope to the processing of system_views.sql.
>
> No changes needed in the parser, planner, etc. My assessment is that this
> worry is not a major concern in practice with the right implementation.
>
> On 3.
>
> Having worked with both SQL and bki DATA directives I have personally found
> the convenience of SQL outweighs the pain. In many cases changes, such as
> adding a new column to pg_proc, have minimal impact on the SQL
> representation and what changes are needed are often simple to implement.
> E.g. accounting for COST only needs to be done for the functions that need
> something other than the default value. This however is somewhat
> subjective.
>
> On the Pros side:
>
> a. Debugging bootstrap is extremely painful, debugging once initdb has
> gotten to 'postgres --single' is way easier.
>
> b. It is easier to introduce minor issues with DATA directives than it is
> when using the SQL processing used for all other user objects.
>
> Example: currently in Postgres all builtin functions default to COST 1,
> and all SQL functions default to cost 100. However the following SQL
> functions included in bootstrap inexplicably are initialized with a COST of
> 1:
> age(timestamp with time zone)
> age(timestamp without time zone)
> bit_length(bytea)
> bit_length(text)
> bit_length(bit)
> date_part(text, abstime)
> date_part(text, reltime)
> date_part(text, date)
> ... and 26 other examples
>
> c. SQL files are significantly less of a PITA (subjective opinion, but I
> can say this from a perspective of experience working with both DATA
> directives and SQL driven catalog definition).
>
> If people have interest I can share my patch so far if that helps address
> concerns, but if there is not interest then I'll probably leave my
> prototype where it is rather than investing more effort in the proof of
> concept.
>
> Thanks,
> Caleb
>

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-12-11 19:25:26 Re: [sqlsmith] Failed to generate plan on lateral subqueries
Previous Message Robert Haas 2015-12-11 18:24:49 Re: Logical replication and multimaster