Re: Coccinelle for PostgreSQL development [4/N]: correcting palloc() use

From: Mats Kindahl <mats(at)timescale(dot)com>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Coccinelle for PostgreSQL development [4/N]: correcting palloc() use
Date: 2025-01-09 07:54:56
Message-ID: CA+14424CWusjBxX5Hc54DTc1XAk1DBL4JGhHeqhp0XMYo-O=-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 8, 2025 at 10:02 PM Peter Eisentraut <peter(at)eisentraut(dot)org>
wrote:

> On 07.01.25 20:49, Mats Kindahl wrote:
> > This is the first example semantic patch and shows how to capture and
> > fix a common problem.
> >
> > If you use an palloc() to allocate memory for an object (or an array of
> > objects) and by mistake type something like:
> >
> > StringInfoData *info = palloc(sizeof(StringInfoData*));
> >
> > You will not allocate enough memory for storing the object. This
> > semantic patch catches any cases where you are either allocating an
> > array of objects or a single object that do not have corret types in
> > this sense, more precisely, it captures assignments to a variable of
> > type T* where palloc() uses sizeof(T) either alone or with a single
> > expression (assuming this is an array count).
> >
> > The semantic patch is overzealous in the sense that allocation to a
> > "const char **" expects a "sizeof(const char *)" and it cannot deal with
> > typedefs that introduce aliases (these two can be seen in the patch).
> > Although the sizes of these are the same, and Coccinelle do not have a
> > good system for comparing types, it might be better to just follow the
> > convention of always using the type "T*" for any "palloc(sizeof(T))"
> > since it makes automated checking easier and is a small inconvenience;
> > especially considering that coccicheck can easily fix this for you. It
> > also simplifies other automated checking to follow this convention.
>
> I think this kind of thing is better addressed with a static analyzer or
> some heavier compiler annotations and warnings or something like that.
> That kind of tool would have the right level of information to decide
> whether some code is correct. Having a text matching tool do it is just
> never going to be complete, and the problems you mention show that this
> tool fundamentally doesn't understand the code, and just knows how to
> parse the code a little better than grep or sed.
>

Thanks for the comments Peter.

On one hand, I agree with you. Coccinelle understands the *structure* of
the code (e.g., what is a type and what is an identifier, where functions
start and end, etc.), but not the *semantics* of the code (e.g., if two
types are different).

On the other hand, however, development practice is also about following
common coding patterns to avoid problems and improve readability. In this
sense, having strange situations in the code that just happen to work makes
it more difficult for developers as well as for automation tools. For
example, allocating memory for elements of sizeof(T**) and assigning it to
a T** variable will not be caught by something like ASAN (because the sizes
match), and cannot trigger a fault (again, because the sizes match), but
correcting this would still make the life easier for both developers and
static analysis tools.

That said, if your only concern is the "const int *" vs. "int *" thing (but
not the T** vs T* thing), it is possible to deal with this by either adding
Python code to the Coccinelle script to just remove all "const" since they
do not affect the size or detect that one of the types has "const" in it
and just ignore this situation and not suggest to patch it.

> I do like the idea of exploring this tool for facilitating code
> rewriting or semantic patching. But I'm suspicious about using it for
> enforcing coding styles or patterns.
>

The case we're discussing here is one such case where I think that
enforcing a coding style might not be a good idea, but I think it was
valuable to raise the issue and get opinions. (This is why I mentioned the
problem with the patch being overzealous.)

The real value here is not specific patches though. It is having a tool
like Coccinelle for detecting and correcting problematic code as part of
the development practice since I think it would speed up development and
review and also improve the quality of the code.
--
Best wishes,
Mats Kindahl, Timescale

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-01-09 07:55:18 Re: [PATCH] Add get_bytes() and set_bytes() functions
Previous Message Hayato Kuroda (Fujitsu) 2025-01-09 07:52:57 RE: ecpg command does not warn COPY ... FROM STDIN;