Re: xid wraparound danger due to INDEX_CLEANUP false

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Subject: Re: xid wraparound danger due to INDEX_CLEANUP false
Date: 2020-04-23 03:08:42
Message-ID: CAH2-Wz=WRu6NMWtit2weDnuGxdsWeNyFygeBP_zZ2Sso0YAGFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 22, 2020 at 6:05 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> (It would be nice if we could teach Valgrind to "poison" buffers when
> we don't have a pin held...that would probably have caught this issue
> almost immediately.)

I can get Valgrind to complain about it when the regression tests are
run with the attached patch applied. I see this in the logs at several
points when "make installcheck" runs:

==1082059== VALGRINDERROR-BEGIN
==1082059== Invalid read of size 4
==1082059== at 0x21D8DE: btvacuumpage (nbtree.c:1370)
==1082059== by 0x21DA61: btvacuumscan (nbtree.c:1039)
==1082059== by 0x21DBD5: btbulkdelete (nbtree.c:879)
==1082059== by 0x215821: index_bulk_delete (indexam.c:698)
==1082059== by 0x20FDCE: lazy_vacuum_index (vacuumlazy.c:2427)
==1082059== by 0x2103EA: lazy_vacuum_all_indexes (vacuumlazy.c:1794)
==1082059== by 0x211EA1: lazy_scan_heap (vacuumlazy.c:1681)
==1082059== by 0x211EA1: heap_vacuum_rel (vacuumlazy.c:510)
==1082059== by 0x360414: table_relation_vacuum (tableam.h:1457)
==1082059== by 0x360414: vacuum_rel (vacuum.c:1880)
==1082059== by 0x361785: vacuum (vacuum.c:449)
==1082059== by 0x361F0E: ExecVacuum (vacuum.c:249)
==1082059== by 0x4D979C: standard_ProcessUtility (utility.c:823)
==1082059== by 0x4D9C7F: ProcessUtility (utility.c:522)
==1082059== by 0x4D6791: PortalRunUtility (pquery.c:1157)
==1082059== by 0x4D725F: PortalRunMulti (pquery.c:1303)
==1082059== by 0x4D7CEF: PortalRun (pquery.c:779)
==1082059== by 0x4D3BB7: exec_simple_query (postgres.c:1239)
==1082059== by 0x4D4ABD: PostgresMain (postgres.c:4315)
==1082059== by 0x45B0C9: BackendRun (postmaster.c:4510)
==1082059== by 0x45B0C9: BackendStartup (postmaster.c:4202)
==1082059== by 0x45B0C9: ServerLoop (postmaster.c:1727)
==1082059== by 0x45C754: PostmasterMain (postmaster.c:1400)
==1082059== by 0x3BDD68: main (main.c:210)
==1082059== Address 0x6cc7378 is in a rw- anonymous segment
==1082059==
==1082059== VALGRINDERROR-END

(The line numbers might be slightly different to master here, but the
line from btvacuumpage() is definitely the one that accesses the
special area of the B-Tree page after we drop the pin.)

This patch is very rough -- it was just the first thing that I tried.
I don't know how Valgrind remembers the status of shared memory
regions across backends when they're marked with
VALGRIND_MAKE_MEM_NOACCESS(). Even still, this idea looks promising. I
should try to come up with a committable patch before too long.

The good news is that the error I showed is the only error that I see,
at least with this rough patch + "make installcheck". It's possible
that the patch isn't as effective as it could be, though. For one
thing, it definitely won't detect incorrect buffer accesses where a
pin is held but a buffer lock is not held. That seems possible, but a
bit harder.

--
Peter Geoghegan

Attachment Content-Type Size
0002-Valgrind-instrumentation.patch application/octet-stream 3.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-04-23 03:13:50 Re: PG compilation error with Visual Studio 2015/2017/2019
Previous Message David Rowley 2020-04-23 02:50:22 Re: create partition table caused server crashed with self-referencing foreign key