Re: Fix some ubsan/asan related issues

From: Junwang Zhao <zhjwpku(at)gmail(dot)com>
To: Tristan Partin <tristan(at)neon(dot)tech>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Fix some ubsan/asan related issues
Date: 2024-09-16 13:29:21
Message-ID: CAEG8a3LKmvZtT4jSRXma=eaDnEvdRD8cFdOVVRVCF50aNdMN2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tristan,

On Tue, Feb 6, 2024 at 11:53 AM Tristan Partin <tristan(at)neon(dot)tech> wrote:
>
> On Tue Jan 30, 2024 at 3:58 PM CST, Andres Freund wrote:
> > Hi,
> >
> > On 2024-01-30 09:59:25 -0600, Tristan Partin wrote:
> > > From 331cec1c9db6ff60dcc6d9ba62a9c8be4e5e95ed Mon Sep 17 00:00:00 2001
> > > From: Tristan Partin <tristan(at)neon(dot)tech>
> > > Date: Mon, 29 Jan 2024 18:03:39 -0600
> > > Subject: [PATCH v1 1/3] Refuse to register message in LogLogicalMessage if
> > > NULL
> >
> > > If this occurs, the memcpy of rdata_data in CopyXLogRecordToWAL breaks
> > > the API contract of memcpy in glibc. The two pointer arguments are
> > > marked as nonnull, even in the event the amount to copy is 0 bytes.
> >
> > It seems pretty odd to call LogLogicalMessage() with a NULL argument. Why is
> > that something useful?
>
> Dropped. Will change on the Neon side. Should we add an assert
> somewhere for good measure? Near the memcpy in question?
>
> > > From dc9488f3fdee69b981b52c985fb77106d7d301ff Mon Sep 17 00:00:00 2001
> > > From: Tristan Partin <tristan(at)neon(dot)tech>
> > > Date: Wed, 24 Jan 2024 17:07:01 -0600
> > > Subject: [PATCH v1 2/3] meson: Support compiling with -Db_sanitize=address
> > >
> > > The ecpg is parser is extremely leaky, so we need to silence leak
> > > detection.
> >
> > This does stuff beyond epcg...
>
> Dropped.
>
> > > +if get_option('b_sanitize').contains('address')
> > > + cdata.set('USE_ADDRESS_SANITIZER', 1)
> > > +endif
> > >
> > > ###############################################################
> > > # NLS / Gettext
> > > diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
> > > index ac409b0006..e18e716d9c 100644
> > > --- a/src/bin/initdb/initdb.c
> > > +++ b/src/bin/initdb/initdb.c
> > > @@ -338,6 +338,17 @@ do { \
> > > output_failed = true, output_errno = errno; \
> > > } while (0)
> > >
> > > +#ifdef USE_ADDRESS_SANITIZER
> >
> > When asan is used __SANITIZE_ADDRESS__ is defined, so we don't need to
> > implement this ourselves.
>
> Thanks!
>
> > > +const char *__asan_default_options(void);
> > > +
> > > +const char *__asan_default_options(void)
> > > +{
> > > + return "detect_leaks=0";
> > > +}
> > > +
> > > +#endif
> >
> > Wonder if we should move this into some static library and link it into all
> > binaries that don't want leak detection? It doesn't seem great to have to
> > adjust this in a bunch of files if we want to adjust the options...
>
> See attached patches. Here is what I found to be necessary to get
> a -Db_sanitize=address,undefined build to successfully make it through
> tests. I do have a few concerns about the patch.
>
> 1. For whatever reason, __SANITIZE_LEAK__ is not defined when the leak
> sanitizer is enabled. So you will see me use this, to make some
> include directives work. I don't like this as a final solution
> because someone could use -fsanitize=leak.
> 2. I tracked down what seems to be a valid leak in adt/xml.c. Attached
> (test.sql) is a fairly minimal reproduction of what is needed to show
> the leak. I didn't spend too much time tracking it down. Might get to
> it later, who knows. Below you will find the backtrace, and whoever
> wants to try their hand at fixing it will need to comment out
> xmlNewNode in the leak.supp file.
> 3. I don't love the new library name. Maybe it should be name more lsan
> specific.
> 4. Should pg_attribute_no_asan be renamed to
> pg_attribute_no_sanitize_address? That would match
> pg_attribute_no_sanitize_alignment.
>
> I will also attach a Meson test log for good measure. I didn't try
> testing anything that requires PG_TEST_EXTRA, but I suspect that
> everything will be fine.
>
> Alexander, I haven't yet gotten to the things you pointed out in the
> sibling thread.
>
> ==221848==ERROR: LeakSanitizer: detected memory leaks
>
> Direct leak of 120 byte(s) in 1 object(s) allocated from:
> #0 0x7fac4a6d92ef in malloc (/lib64/libasan.so.8+0xd92ef) (BuildId: 7fcb7759bc17ef47f9682414b6d99732d6a6ab0c)
> #1 0x7fac4a48427d in xmlNewNode (/lib64/libxml2.so.2+0x5d27d) (BuildId: 3074681c8fa9b17e0cbed09bc61c25ada5c28e7c)
> #2 0x22107a6 in xmltotext_with_options ../src/backend/utils/adt/xml.c:754
> #3 0xead047 in ExecEvalXmlExpr ../src/backend/executor/execExprInterp.c:4020
> #4 0xe8c119 in ExecInterpExpr ../src/backend/executor/execExprInterp.c:1537
> #5 0xe91f2c in ExecInterpExprStillValid ../src/backend/executor/execExprInterp.c:1881
> #6 0x109632d in ExecEvalExprSwitchContext ../src/include/executor/executor.h:355
> #7 0x109655d in ExecProject ../src/include/executor/executor.h:389
> #8 0x1097186 in ExecResult ../src/backend/executor/nodeResult.c:136
> #9 0xf0f90c in ExecProcNodeFirst ../src/backend/executor/execProcnode.c:464
> #10 0xec9bec in ExecProcNode ../src/include/executor/executor.h:273
> #11 0xed875c in ExecutePlan ../src/backend/executor/execMain.c:1670
> #12 0xecbee0 in standard_ExecutorRun ../src/backend/executor/execMain.c:365
> #13 0xecb529 in ExecutorRun ../src/backend/executor/execMain.c:309
> #14 0x1ae89f6 in PortalRunSelect ../src/backend/tcop/pquery.c:924
> #15 0x1ae7c06 in PortalRun ../src/backend/tcop/pquery.c:768
> #16 0x1ad1b43 in exec_simple_query ../src/backend/tcop/postgres.c:1273
> #17 0x1adf8de in PostgresMain ../src/backend/tcop/postgres.c:4653
> #18 0x170dbce in BackendRun ../src/backend/postmaster/postmaster.c:4464
> #19 0x170bf70 in BackendStartup ../src/backend/postmaster/postmaster.c:4140
> #20 0x170263f in ServerLoop ../src/backend/postmaster/postmaster.c:1776
> #21 0x1701052 in PostmasterMain ../src/backend/postmaster/postmaster.c:1475
> #22 0x11a5f67 in main ../src/backend/main/main.c:198
> #23 0x7fac48e46149 in __libc_start_call_main (/lib64/libc.so.6+0x28149) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
> #24 0x7fac48e4620a in __libc_start_main_impl (/lib64/libc.so.6+0x2820a) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
> #25 0x49c5d4 in _start (/home/tristan957/Projects/work/postgresql/build/tmp_install/home/tristan957/.opt/postgresql/bin/postgres+0x49c5d4) (BuildId: c8ca341e1303be0f9dc0b0271c55c4b9e42af89b)
>
> Indirect leak of 13 byte(s) in 1 object(s) allocated from:
> #0 0x7fac4a6d92ef in malloc (/lib64/libasan.so.8+0xd92ef) (BuildId: 7fcb7759bc17ef47f9682414b6d99732d6a6ab0c)
> #1 0x7fac4a4e106f in xmlStrndup (/lib64/libxml2.so.2+0xba06f) (BuildId: 3074681c8fa9b17e0cbed09bc61c25ada5c28e7c)
> #2 0x7fac4a4842c0 in xmlNewNode (/lib64/libxml2.so.2+0x5d2c0) (BuildId: 3074681c8fa9b17e0cbed09bc61c25ada5c28e7c)
> #3 0x22107a6 in xmltotext_with_options ../src/backend/utils/adt/xml.c:754
> #4 0xead047 in ExecEvalXmlExpr ../src/backend/executor/execExprInterp.c:4020
> #5 0xe8c119 in ExecInterpExpr ../src/backend/executor/execExprInterp.c:1537
> #6 0xe91f2c in ExecInterpExprStillValid ../src/backend/executor/execExprInterp.c:1881
> #7 0x109632d in ExecEvalExprSwitchContext ../src/include/executor/executor.h:355
> #8 0x109655d in ExecProject ../src/include/executor/executor.h:389
> #9 0x1097186 in ExecResult ../src/backend/executor/nodeResult.c:136
> #10 0xf0f90c in ExecProcNodeFirst ../src/backend/executor/execProcnode.c:464
> #11 0xec9bec in ExecProcNode ../src/include/executor/executor.h:273
> #12 0xed875c in ExecutePlan ../src/backend/executor/execMain.c:1670
> #13 0xecbee0 in standard_ExecutorRun ../src/backend/executor/execMain.c:365
> #14 0xecb529 in ExecutorRun ../src/backend/executor/execMain.c:309
> #15 0x1ae89f6 in PortalRunSelect ../src/backend/tcop/pquery.c:924
> #16 0x1ae7c06 in PortalRun ../src/backend/tcop/pquery.c:768
> #17 0x1ad1b43 in exec_simple_query ../src/backend/tcop/postgres.c:1273
> #18 0x1adf8de in PostgresMain ../src/backend/tcop/postgres.c:4653
> #19 0x170dbce in BackendRun ../src/backend/postmaster/postmaster.c:4464
> #20 0x170bf70 in BackendStartup ../src/backend/postmaster/postmaster.c:4140
> #21 0x170263f in ServerLoop ../src/backend/postmaster/postmaster.c:1776
> #22 0x1701052 in PostmasterMain ../src/backend/postmaster/postmaster.c:1475
> #23 0x11a5f67 in main ../src/backend/main/main.c:198
> #24 0x7fac48e46149 in __libc_start_call_main (/lib64/libc.so.6+0x28149) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
> #25 0x7fac48e4620a in __libc_start_main_impl (/lib64/libc.so.6+0x2820a) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
> #26 0x49c5d4 in _start (/home/tristan957/Projects/work/postgresql/build/tmp_install/home/tristan957/.opt/postgresql/bin/postgres+0x49c5d4) (BuildId: c8ca341e1303be0f9dc0b0271c55c4b9e42af89b)
>
> SUMMARY: AddressSanitizer: 133 byte(s) leaked in 2 allocation(s).
>
> --
> Tristan Partin
> Neon (https://neon.tech)

I tried your v1-0002, it works at compile phase but failed to run initdb
with the following leak detected:

=================================================================
==64983==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 248 byte(s) in 1 object(s) allocated from:
#0 0x7fc7729df9cf in __interceptor_malloc
../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
#1 0x55bff5480e8b in save_ps_display_args
../postgres/src/backend/utils/misc/ps_status.c:190
#2 0x55bff4a5a298 in main ../postgres/src/backend/main/main.c:90
#3 0x7fc771924249 in __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58

Indirect leak of 19 byte(s) in 1 object(s) allocated from:
#0 0x7fc77299777b in __interceptor_strdup
../../../../src/libsanitizer/asan/asan_interceptors.cpp:439
#1 0x55bff5480f41 in save_ps_display_args
../postgres/src/backend/utils/misc/ps_status.c:198
#2 0x55bff4a5a298 in main ../postgres/src/backend/main/main.c:90
#3 0x7fc771924249 in __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58

I worked around by moving *new_environ* as a global variable, I think this is
false positive report so maybe this deserves a patch?

I tried to apply your v2 patch set but v2-0004 seems out of date.

--
Regards
Junwang Zhao

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Jones 2024-09-16 13:40:07 Re: Psql meta-command conninfo+
Previous Message Jakub Wartak 2024-09-16 13:11:31 Re: scalability bottlenecks with (many) partitions (and more)