From: | Mats Kindahl <mats(at)timescale(dot)com> |
---|---|
To: | Aleksander Alekseev <aleksander(at)timescale(dot)com> |
Cc: | PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net> |
Subject: | Re: Coccinelle for PostgreSQL development [1/N]: coccicheck.py |
Date: | 2025-01-18 19:44:00 |
Message-ID: | CA+14425Q2u14FgqndYyHwmUds+v+9Sg6kyUqKMFsC2Gb+H2Bnw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jan 14, 2025 at 4:19 PM Aleksander Alekseev <
aleksander(at)timescale(dot)com> wrote:
> IMO the best solution would be re-submitting all the patches to this
> thread. Also please make sure the patchset is registered on the
> nearest open CF [1] This will ensure that the patchset is built on our
> CI (aka cfbot [2]) and will not be lost.
>
> [1]: https://commitfest.postgresql.org/
> [2]: http://cfbot.cputube.org/
>
>
Thank you Aleksander,
Here is a new post with all patches attached and all comments combined.
For those of you that are not aware of it: Coccinelle is a tool for pattern
matching and text transformation for C code and can be used for detection
of problematic programming patterns and to make complex, tree-wide patches
easy. It is aware of the structure of C code and is better suited to make
complicated changes than what is possible using normal text substitution
tools like Sed and Perl. I've noticed it's been used at a few cases way
back to fix issues.[1]
Coccinelle have been successfully been used in the Linux project since 2008
and is now an established tool for Linux development and a large number of
semantic patches have been added to the source tree to capture everything
from generic issues (like eliminating the redundant A in expressions like
"!A || (A && B)") to more Linux-specific problems like adding a missing
call to kfree().
Although PostgreSQL is nowhere the size of the Linux kernel, it is
nevertheless of a significant size and would benefit from incorporating
Coccinelle into the development. I noticed it's been used in a few cases
way back (like 10 years back) to fix issues in the PostgreSQL code, but I
thought it might be useful to make it part of normal development practice
to, among other things:
- Identify and correct bugs in the source code both during development and
review.
- Make large-scale changes to the source tree to improve the code based on
new insights.
- Encode and enforce APIs by ensuring that function calls are used
correctly.
- Use improved coding patterns for more efficient code.
- Allow extensions to automatically update code for later PostgreSQL
versions.
To that end, I created a series of patches to show how it could be used in
the PostgreSQL tree. It is a lot easier to discuss concrete code and I
split it up into separate messages since that makes it easier to discuss
each individual patch. The series contains code to make it easy to work
with Coccinelle during development and reviews, as well as examples of
semantic patches that capture problems, demonstrate how to make large-scale
changes, how to enforce APIs, and also improve some coding patterns.
The first three patches contain the coccicheck.py script and the
integration with the build system (both Meson and Autoconf).
# Coccicheck Script
It is a re-implementation of the coccicheck script that the Linux kernel
uses. We cannot immediately use the coccicheck script since it is quite
closely tied to the Linux source code tree and we need to have something
that both supports Autoconf and Meson. Since Python seems to be used more
and more in the tree, it seems to be the most natural choice. (I have no
strong opinion on what language to use, but think it would be good to have
something that is as platform-independent as possible.)
The intention is that we should be able to use the Linux semantic patches
directly, so it supports the "Requires" and "Options" keywords, which can
be used to require a specific version of spatch(1) and add options to the
execution of that semantic patch, respectively.
# Autoconf support
The changes to Autoconf modifies the configure.ac and related files (in
particular Makefile.global.in). At this point, I have deliberately not
added support for pgxs so extensions cannot use coccicheck through the
PostgreSQL installation. This is something that we can add later though.
The semantic patches are expected to live in cocci/ directory under the
root and the patch uses the pattern cocci/**/*.cocci to find all semantic
patches. Right now there are no subdirectories for the semantic patches,
but this might be something we want to add to create different categories
of scripts.
The coccicheck target is used in the same way as for the Linux kernel, that
is, to generate and apply all patches suggested by the semantic patches,
you type:
make coccicheck MODE=patch | patch -p1
Linux as support for a few more variables: V to set the verbosity, J to use
multiple jobs for processing the semantic patches, M to select a different
directory to apply the semantic patches to, and COCCI to use a single
specific semantic patch rather than all available. I have not added support
for this right now, but if you think this is valuable, it should be
straightforward to add.
I used autoconf 2.69, as mentioned in configure.ac, but that generate a
bigger diff than I expected. Any advice here is welcome.
# Meson Support
The support for Meson is done by adding three coccicheck targets: one for
each mode. To apply all patches suggested by the semantic patches using
ninja (as is done in [2]), you type the following in the build directory
generated by Meson (e.g., the "build/" subdirectory).
ninja coccicheck-patch | patch -p1 -d ..
If you want to pass other flags you have to set the SPFLAGS environment
variable when calling ninja:
SPFLAGS=--debug ninja coccicheck-report
# Semantic Patch: Wrong type for palloc()
This is the first example of a semantic patch and shows how to capture and
fix a common problem.
If you use an palloc() to allocate memory for an object (or an array of
objects) and by mistake type something like:
StringInfoData *info = palloc(sizeof(StringInfoData*));
You will not allocate enough memory for storing the object. This semantic
patch catches any cases where you are either allocating an array of objects
or a single object that do not have corret types in this sense, more
precisely, it captures assignments to a variable of type T* where palloc()
uses sizeof(T) either alone or with a single expression (assuming this is
an array count).
The semantic patch is overzealous in the sense that using the wrong typedef
will suggest a change (this can be seen in the patch). Although the sizes
of these are the same, it is probably be better to just follow the
convention of always using the type "T*" for any "palloc(sizeof(T))" since
the typedef can change at any point and would then introduce a bug.
Coccicheck can easily fix this for you, so it is straightforward to enforce
this. It also simplifies other automated checking to follow this convention.
We don't really have any real bugs as a result from this, but we have one
case where an allocation of "sizeof(LLVMBasicBlockRef*)" is allocated to an
"LLVMBasicBlockRef*", which strictly speaking is not correct (it should be
"sizeof(LLVMBasicBlockRef)"). However, since they are both pointers, there
is no risk of incorrect allocation size. One typedef usage that does not
match.
# Semantic Patch: Introduce palloc_array() and palloc_object() where
possible
This is an example of a large-scale refactoring to improve the code.
For PostgreSQL 16, Peter extended the palloc()/pg_malloc() interface in
commit 2016055a92f to provide more type-safety, but these functions are not
widely used. This semantic patch captures and replaces all uses of palloc()
where palloc_array() or palloc_object() could be used instead. It
deliberately does not touch cases where it is not clear that the
replacement can be done.
[1]: https://coccinelle.gitlabpages.inria.fr/website/
[2]: https://www.postgresql.org/docs/current/install-meson.html
--
Best wishes,
Mats Kindahl, Timescale
Attachment | Content-Type | Size |
---|---|---|
0004-Add-semantic-patch-for-sizeof-using-palloc.v2.patch | text/x-patch | 3.3 KB |
0001-Add-initial-coccicheck-script.v2.patch | text/x-patch | 7.1 KB |
0002-Create-coccicheck-target-for-autoconf.v2.patch | text/x-patch | 9.1 KB |
0003-Add-meson-build-for-coccicheck.v2.patch | text/x-patch | 4.5 KB |
0005-Add-script-for-palloc_array.v2.patch | text/x-patch | 560.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-01-18 20:37:54 | Re: Adding comments to help understand psql hidden queries |
Previous Message | 赵庭海 (庭章) | 2025-01-18 19:32:19 | Re:Limit length of queryies in pg_stat_statement extension |