From: | Mats Kindahl <mats(at)timescale(dot)com> |
---|---|
To: | Aleksander Alekseev <aleksander(at)timescale(dot)com> |
Cc: | PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net> |
Subject: | Re: Coccinelle for PostgreSQL development [1/N]: coccicheck.py |
Date: | 2025-03-02 13:26:08 |
Message-ID: | CA+144273GOMq5NOc+ULbwxTk+Z1_1rwcpMs0gr8Yr6r9T=OjvQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Jan 18, 2025 at 8:44 PM Mats Kindahl <mats(at)timescale(dot)com> wrote:
> On Tue, Jan 14, 2025 at 4:19 PM Aleksander Alekseev <
> aleksander(at)timescale(dot)com> wrote:
>
>> IMO the best solution would be re-submitting all the patches to this
>> thread. Also please make sure the patchset is registered on the
>> nearest open CF [1] This will ensure that the patchset is built on our
>> CI (aka cfbot [2]) and will not be lost.
>>
>> [1]: https://commitfest.postgresql.org/
>> [2]: http://cfbot.cputube.org/
>>
>>
>
Hi all,
Here is a new set of patches rebased on the latest version of the Postgres.
I decided to just include the semantic patches in each patch since it is
trivial to generate the patch and build using:
ninja coccicheck-patch | patch -d .. -p1 && ninja
I repeat the description from the previous patch set and add comments where
things have changed, but I have also added two semantic patches, which are
described last.
For those of you that are not aware of it: Coccinelle is a tool for pattern
> matching and text transformation for C code and can be used for detection
> of problematic programming patterns and to make complex, tree-wide patches
> easy. It is aware of the structure of C code and is better suited to make
> complicated changes than what is possible using normal text substitution
> tools like Sed and Perl. I've noticed it's been used at a few cases way
> back to fix issues.[1]
>
> Coccinelle have been successfully been used in the Linux project since
> 2008 and is now an established tool for Linux development and a large
> number of semantic patches have been added to the source tree to capture
> everything from generic issues (like eliminating the redundant A in
> expressions like "!A || (A && B)") to more Linux-specific problems like
> adding a missing call to kfree().
>
> Although PostgreSQL is nowhere the size of the Linux kernel, it is
> nevertheless of a significant size and would benefit from incorporating
> Coccinelle into the development. I noticed it's been used in a few cases
> way back (like 10 years back) to fix issues in the PostgreSQL code, but I
> thought it might be useful to make it part of normal development practice
> to, among other things:
>
> - Identify and correct bugs in the source code both during development and
> review.
> - Make large-scale changes to the source tree to improve the code based on
> new insights.
> - Encode and enforce APIs by ensuring that function calls are used
> correctly.
> - Use improved coding patterns for more efficient code.
> - Allow extensions to automatically update code for later PostgreSQL
> versions.
>
> To that end, I created a series of patches to show how it could be used in
> the PostgreSQL tree. It is a lot easier to discuss concrete code and I
> split it up into separate messages since that makes it easier to discuss
> each individual patch. The series contains code to make it easy to work
> with Coccinelle during development and reviews, as well as examples of
> semantic patches that capture problems, demonstrate how to make large-scale
> changes, how to enforce APIs, and also improve some coding patterns.
>
> The first three patches contain the coccicheck.py script and the
> integration with the build system (both Meson and Autoconf).
>
> # Coccicheck Script
>
> It is a re-implementation of the coccicheck script that the Linux kernel
> uses. We cannot immediately use the coccicheck script since it is quite
> closely tied to the Linux source code tree and we need to have something
> that both supports Autoconf and Meson. Since Python seems to be used more
> and more in the tree, it seems to be the most natural choice. (I have no
> strong opinion on what language to use, but think it would be good to have
> something that is as platform-independent as possible.)
>
> The intention is that we should be able to use the Linux semantic patches
> directly, so it supports the "Requires" and "Options" keywords, which can
> be used to require a specific version of spatch(1) and add options to the
> execution of that semantic patch, respectively.
>
I have added support for using multiple jobs similar to how "make -jN"
works. This is also supported by the autoconf and ninja builds
> # Autoconf support
>
> The changes to Autoconf modifies the configure.ac and related files (in
> particular Makefile.global.in). At this point, I have deliberately not
> added support for pgxs so extensions cannot use coccicheck through the
> PostgreSQL installation. This is something that we can add later though.
>
> The semantic patches are expected to live in cocci/ directory under the
> root and the patch uses the pattern cocci/**/*.cocci to find all semantic
> patches. Right now there are no subdirectories for the semantic patches,
> but this might be something we want to add to create different categories
> of scripts.
>
> The coccicheck target is used in the same way as for the Linux kernel,
> that is, to generate and apply all patches suggested by the semantic
> patches, you type:
>
> make coccicheck MODE=patch | patch -p1
>
> Linux as support for a few more variables: V to set the verbosity, J to
> use multiple jobs for processing the semantic patches, M to select a
> different directory to apply the semantic patches to, and COCCI to use a
> single specific semantic patch rather than all available. I have not added
> support for this right now, but if you think this is valuable, it should be
> straightforward to add.
>
> I used autoconf 2.69, as mentioned in configure.ac, but that generate a
> bigger diff than I expected. Any advice here is welcome.
>
Using the parameter "JOBS" allow you to use multiple jobs, e.g.:
make coccicheck MODE=patch JOBS=4 | patch -p1
> # Meson Support
>
> The support for Meson is done by adding three coccicheck targets: one for
> each mode. To apply all patches suggested by the semantic patches using
> ninja (as is done in [2]), you type the following in the build directory
> generated by Meson (e.g., the "build/" subdirectory).
>
> ninja coccicheck-patch | patch -p1 -d ..
>
> If you want to pass other flags you have to set the SPFLAGS environment
> variable when calling ninja:
>
> SPFLAGS=--debug ninja coccicheck-report
>
If you want to use multiple jobs, you use something like this:
JOBS=4 ninja coccicheck-patch | patch -d .. -p1
> # Semantic Patch: Wrong type for palloc()
>
> This is the first example of a semantic patch and shows how to capture and
> fix a common problem.
>
> If you use an palloc() to allocate memory for an object (or an array of
> objects) and by mistake type something like:
>
> StringInfoData *info = palloc(sizeof(StringInfoData*));
>
> You will not allocate enough memory for storing the object. This semantic
> patch catches any cases where you are either allocating an array of objects
> or a single object that do not have corret types in this sense, more
> precisely, it captures assignments to a variable of type T* where palloc()
> uses sizeof(T) either alone or with a single expression (assuming this is
> an array count).
>
> The semantic patch is overzealous in the sense that using the wrong
> typedef will suggest a change (this can be seen in the patch). Although the
> sizes of these are the same, it is probably be better to just follow the
> convention of always using the type "T*" for any "palloc(sizeof(T))" since
> the typedef can change at any point and would then introduce a bug.
> Coccicheck can easily fix this for you, so it is straightforward to enforce
> this. It also simplifies other automated checking to follow this convention.
>
> We don't really have any real bugs as a result from this, but we have one
> case where an allocation of "sizeof(LLVMBasicBlockRef*)" is allocated to an
> "LLVMBasicBlockRef*", which strictly speaking is not correct (it should be
> "sizeof(LLVMBasicBlockRef)"). However, since they are both pointers, there
> is no risk of incorrect allocation size. One typedef usage that does not
> match.
>
> # Semantic Patch: Introduce palloc_array() and palloc_object() where
> possible
>
> This is an example of a large-scale refactoring to improve the code.
>
> For PostgreSQL 16, Peter extended the palloc()/pg_malloc() interface in
> commit 2016055a92f to provide more type-safety, but these functions are not
> widely used. This semantic patch captures and replaces all uses of palloc()
> where palloc_array() or palloc_object() could be used instead. It
> deliberately does not touch cases where it is not clear that the
> replacement can be done.
>
# Semantic Patch: replace code with pg_cmp_*
This is an example of a large-scale refactoring to improve the code.
In commit 3b42bdb4716 and 6b80394781c overflow-safe comparison functions
were introduced, but they are not widely used. This semantic patch
identifies some of the more common cases and replaces them with calls to
the corresponding pg_cmp_* function.
The patches give a few instructions of improvement in performance when
checking with Godbolt. It's not much, but since it is so easy to apply
them, it might still be worthwhile.
# Semantic Patch: Replace dynamic allocation of StringInfo with
StringInfoData
Use improved coding patterns for more efficient code.
This semantic patch replaces uses of StringInfo with StringInfoData where
the info is dynamically allocated but (optionally) freed at the end of the
block. This will avoid one dynamic allocation that otherwise has to be
dealt with.
For example, this code:
StringInfo info = makeStringInfo();
...
appendStringInfo(info, ...);
...
return do_stuff(..., info->data, ...);
Can be replaced with:
StringInfoData info;
initStringInfo(&info);
...
appendStringInfo(&info, ...);
...
return do_stuff(..., info.data, ...);
It does not do a replacement in these cases:
- If the variable is assigned to an expression. In this case, the
pointer can "leak" outside the function either through a global variable or
a parameter assignment.
- If an assignment is done to the expression. This cannot leak the data,
but could mean a value-assignment of a structure, so we avoid this case.
- If the pointer is returned.
The cases that this semantic patch fixed when I uploaded the first version
of the other patches seems to have been dealt with, but having it as part
of the code base prevents such cases from surfacing again.
> [1]: https://coccinelle.gitlabpages.inria.fr/website/
> [2]: https://www.postgresql.org/docs/current/install-meson.html
>
--
Best wishes,
Mats Kindahl, Timescale
Attachment | Content-Type | Size |
---|---|---|
0005-Semantic-patch-for-palloc_array-and-palloc_object.v3.patch | text/x-patch | 4.2 KB |
0003-Add-meson-build-for-coccicheck.v3.patch | text/x-patch | 4.6 KB |
0004-Semantic-patch-for-sizeof-using-palloc.v3.patch | text/x-patch | 1.7 KB |
0006-Semantic-patch-for-pg_cmp_-functions.v3.patch | text/x-patch | 4.4 KB |
0007-Semantic-patch-to-use-stack-allocated-StringInfoData.v3.patch | text/x-patch | 4.4 KB |
0001-Add-initial-coccicheck-script.v3.patch | text/x-patch | 8.4 KB |
0002-Create-coccicheck-target-for-autoconf.v3.patch | text/x-patch | 9.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Magnus Hagander | 2025-03-02 14:38:07 | Re: Statistics Import and Export |
Previous Message | Julien Tachoires | 2025-03-02 13:23:54 | Re: Allow table AMs to define their own reloptions |