From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Valgrind Memcheck support |
Date: | 2013-06-09 21:25:59 |
Message-ID: | 20130609212559.GB491289@tornado.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Valgrind's Memcheck tool[1] is handy for finding bugs, but our use of a custom
allocator limits its ability to detect problems in unmodified PostgreSQL.
During the 9.1 beta cycle, I found some bugs[2] with a rough patch adding
instrumentation to aset.c and mcxt.c such that Memcheck understood our
allocator. I've passed that patch around to a few people over time, and I've
now removed the roughness such that it's ready for upstream. In hopes of
making things clearer in the commit history, I've split out a preliminary
refactoring patch from the main patch and attached each separately.
Besides the aset.c/mcxt.c instrumentation, this patch adds explicit checks for
undefined memory to PageAddItem() and printtup(); this has caught C-language
functions that fabricate a Datum without initializing all bits. The inclusion
of all this is controlled by a pg_config_manual.h setting. The patch also
adds a "suppression file" that directs Valgrind to silences nominal errors we
don't plan to fix.
To smoke-test the instrumentation, I used "make installcheck" runs on x86_64
GNU/Linux and ppc64 GNU/Linux. This turned up various new and newly-detected
memory bugs, which I will discuss in a separate thread. With those fixed,
"make installcheck" has a clean report (in my one particular configuration).
I designed the suppression file to work across platforms; I specifically
anticipated eventual use on x86_64 Darwin and x86_64 FreeBSD. Valgrind 3.8.1
quickly crashed when running PostgreSQL on Darwin; I did not dig further.
Since aset.c and mcxt.c contain the hottest code paths in the backend, I
verified that a !USE_VALGRIND, non-assert build produces the same code before
and after the patch. Testing that revealed the need to move up the
AllocSizeIsValid() check in repalloc(), though I don't understand why GCC
reacted that way.
Peter Geoghegan and Korry Douglas provided valuable feedback on earlier
versions of this code.
Possible future directions:
- Test "make installcheck-world". When I last did this in past years, contrib did
trigger some errors.
- Test recovery, such as by running a streaming replica under Memcheck while
the primary runs "make installcheck-world".
- Test newer compilers and higher optimization levels. I used GCC 4.2 at -O1.
A brief look at -O2 results showed a new error that I have not studied. GCC
4.8 at -O3 might show still more due to increasingly-aggressive assumptions.
- A buildfarm member running its installcheck steps this way.
- Memcheck has support for detecting leaks. I have not explored that side at
all, always passing --leak-check=no. We could add support for freeing
"everything" at process exit, thereby making the leak detection meaningful.
Brief notes for folks reproducing my approach: I typically start the
Memcheck-hosted postgres like this:
valgrind --leak-check=no --gen-suppressions=all \
--suppressions=src/tools/valgrind.supp --time-stamp=yes \
--log-file=$HOME/pg-valgrind/%p.log postgres
If that detected an error on which I desired more detail, I would rerun a
smaller test case with "--track-origins=yes --read-var-info=yes". That slows
things noticeably but gives more-specific messaging. When even that left the
situation unclear, I would temporarily hack allocChunkLimit so every palloc()
turned into a malloc().
I strongly advise installing the latest-available Valgrind, particularly
because recent releases suffer far less of a performance drop processing the
instrumentation added by this patch. A "make installcheck" run takes 273
minutes under Vaglrind 3.6.0 but just 27 minutes under Valgrind 3.8.1.
Thanks,
nm
[1] http://valgrind.org/docs/manual/mc-manual.html
[2] http://www.postgresql.org/message-id/20110312133224.GA7833@tornado.gateway.2wire.net
--
Noah Misch
EnterpriseDB http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
aset-debug-refactor-v1.patch | text/plain | 8.7 KB |
valgrind-hooks-201306-v1.patch | text/plain | 26.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2013-06-09 21:29:32 | 9.3 crop of memory errors |
Previous Message | Andrew Dunstan | 2013-06-09 20:55:02 | postgres_fdw regression tests order dependency |