Re: WIP: Generic functions for Node types using generated metadata

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: Generic functions for Node types using generated metadata
Date: 2019-08-30 12:35:34
Message-ID: alpine.DEB.2.21.1908301414100.28828@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Andres,

Just my 0.02 €:

> There's been a lot of complaints over the years about how annoying it is
> to keep the out/read/copy/equalfuncs.c functions in sync with the actual
> underlying structs.
>
> There've been various calls for automating their generation, but no
> actual patches that I am aware of.

I started something a while back, AFAICR after spending stupid time
looking for a stupid missing field copy or whatever. I wrote a (simple)
perl script deriving all (most) node utility functions for the header
files.

I gave up as the idea did not gather much momentum from committers, so I
assumed the effort would be rejected in the end. AFAICR the feedback
spirit was something like "node definition do not change often, we can
manage it by hand".

> There also recently has been discussion about generating more efficient
> memory layout for node trees that we know are read only (e.g. plan trees
> inside the plancache), and about copying such trees more efficiently
> (e.g. by having one big allocation, and then just adjusting pointers).

If pointers are relative to the start, it could be just indexes that do
not need much adjusting.

> One way to approach this problem would be to to parse the type
> definitions, and directly generate code for the various functions. But
> that does mean that such a code-generator needs to be expanded for each
> such functions.

No big deal for the effort I made. The issue was more dealing with
exceptions (eg "we do not serialize this field because it is not used for
some reason") and understanding some implicit assumptions in the struct
declarations.

> An alternative approach is to have a parser of the node definitions that
> doesn't generate code directly, but instead generates metadata. And then
> use that metadata to write node aware functions. This seems more
> promising to me.

Hmmm. The approach we had in an (old) research project was to write the
meta data, and derive all struct & utility functions from these. It is
simpler this way because you save parsing some C, and it can be made
language agnostic (i.e. serializing the data structure from a language and
reading its value from another).

> I'm fairly sure this metadata can also be used to write the other
> currently existing node functions.

Beware of strange exceptions…

> With regards to using libclang for the parsing: I chose that because it
> seemed the easiest to experiment with, compared to annotating all the
> structs with enough metadata to be able to easily parse them from a perl
> script.

I did not find this an issue when I tried, because the annotation needed
is basically the type name of the field.

> The node definitions are after all distributed over quite a few headers.

Yep.

> I think it might even be the correct way forward, over inventing our own
> mini-languages and writing ad-hoc parsers for those. It sure is easier
> to understand plain C code, compared to having to understand various
> embeded mini-languages consisting out of macros.

Dunno.

> The obvious drawback is that it'd require more people to install
> libclang - a significant imposition.

Indeed. A perl-only dependence would be much simpler that relying on a
particular library from a particular compiler to compile postgres,
possibly with an unrelated compiler.

> Alternatively we could annotate the code enough to be able to write our
> own parser, or use some other C parser.

If you can dictate some conventions, eg one line/one field, simple perl
regexpr would work well I think, you would not need a parser per se.

> I don't really want to invest significantly more time into this without
> first debating the general idea.

That what I did, and I quitted quickly:-)

On the general idea, I'm 100% convinced that stupid utility functions
should be either generic or generated, not maintained by hand.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rajkumar Raghuwanshi 2019-08-30 12:56:31 Re: block-level incremental backup
Previous Message Andrey Borodin 2019-08-30 11:44:43 Re: Yet another fast GiST build