Re: valgrind error

From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: valgrind error
Date: 2020-05-10 13:29:05
Message-ID: 03e424b8-8dbc-1cd6-f743-3b5b93d5fe9f@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 4/18/20 9:15 AM, Andrew Dunstan wrote:
> I was just trying to revive lousyjack, my valgrind buildfarm animal
> which has been offline for 12 days, after having upgraded the machine
> (fedora 31, gcc 9.3.1, valgrind 3.15) and noticed lots of errors like this:
>
>
> 2020-04-17 19:26:03.483 EDT [63741:3] pg_regress LOG:  statement: CREATE
> DATABASE "regression" TEMPLATE=template0
> ==63717== VALGRINDERROR-BEGIN
> ==63717== Use of uninitialised value of size 8
> ==63717==    at 0xAC5BB5: pg_comp_crc32c_sb8 (pg_crc32c_sb8.c:82)
> ==63717==    by 0x55A98B: XLogRecordAssemble (xloginsert.c:785)
> ==63717==    by 0x55A268: XLogInsert (xloginsert.c:461)
> ==63717==    by 0x8BC9E0: LogCurrentRunningXacts (standby.c:1005)
> ==63717==    by 0x8BC8F9: LogStandbySnapshot (standby.c:961)
> ==63717==    by 0x550CB3: CreateCheckPoint (xlog.c:8937)
> ==63717==    by 0x82A3B2: CheckpointerMain (checkpointer.c:441)
> ==63717==    by 0x56347D: AuxiliaryProcessMain (bootstrap.c:453)
> ==63717==    by 0x83CA18: StartChildProcess (postmaster.c:5474)
> ==63717==    by 0x83A120: reaper (postmaster.c:3045)
> ==63717==    by 0x4874B1F: ??? (in /usr/lib64/libpthread-2.30.so)
> ==63717==    by 0x5056F29: select (in /usr/lib64/libc-2.30.so)
> ==63717==    by 0x8380A0: ServerLoop (postmaster.c:1691)
> ==63717==    by 0x837A1F: PostmasterMain (postmaster.c:1400)
> ==63717==    by 0x74A71D: main (main.c:210)
> ==63717==  Uninitialised value was created by a stack allocation
> ==63717==    at 0x8BC942: LogCurrentRunningXacts (standby.c:984)
> ==63717==
> ==63717== VALGRINDERROR-END
> {
>    <insert_a_suppression_name_here>
>    Memcheck:Value8
>    fun:pg_comp_crc32c_sb8
>    fun:XLogRecordAssemble
>    fun:XLogInsert
>    fun:LogCurrentRunningXacts
>    fun:LogStandbySnapshot
>    fun:CreateCheckPoint
>    fun:CheckpointerMain
>    fun:AuxiliaryProcessMain
>    fun:StartChildProcess
>    fun:reaper
>    obj:/usr/lib64/libpthread-2.30.so
>    fun:select
>    fun:ServerLoop
>    fun:PostmasterMain
>    fun:main
> }
>
>

After many hours of testing I have a culprit for this. The error appears
with valgrind 3.15.0  with everything else held constant. 3.14.0  does
not produce the problem.  So lousyjack will be back on the air before long.

Here are the build flags it's using:

CFLAGS=-Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -Wno-format-truncation
-Wno-stringop-truncatio
n -g -fno-omit-frame-pointer -O0 -fPIC
CPPFLAGS=-DUSE_VALGRIND  -DRELCACHE_FORCE_RELEASE -D_GNU_SOURCE
-I/usr/include/libxml2

and valgrind is invoked like this:

valgrind --quiet --trace-children=yes --track-origins=yes
--read-var-info=yes --num-callers=20 --leak-check=no
--gen-suppressions=all --error-limit=no
--suppressions=../pgsql/src/tools/valgrind.supp
--error-markers=VALGRINDERROR-BEGIN,VALGRINDERROR-END bin/postgres -D data-C

Does anyone see anything here that needs tweaking?

Note that this is quite an old machine:

andrew(at)freddo:bf (master)*$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  1
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           AuthenticAMD
CPU family:          16
Model:               6
Model name:          AMD Athlon(tm) II X2 215 Processor
Stepping:            2
CPU MHz:             2700.000
CPU max MHz:         2700.0000
CPU min MHz:         800.0000
BogoMIPS:            5425.13
Virtualization:      AMD-V
L1d cache:           64K
L1i cache:           64K
L2 cache:            512K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl
nonstop_tsc cpuid extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy
svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
skinit wdt hw_pstate vmmcall npt lbrv svm_lock nrip_save

I did not manage to reproduce this anywhere else, tried on various
physical, Virtualbox and Docker instances.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message godjan • 2020-05-10 13:58:50 Re: Strange decreasing value of pg_last_wal_receive_lsn()
Previous Message Tomas Vondra 2020-05-10 12:25:23 Re: [PATCH] Incremental sort (was: PoC: Partial sort)