Re: uninitialized values in revised prepared xact code

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: uninitialized values in revised prepared xact code
Date: 2014-10-24 14:13:49
Message-ID: 544A5E9D.2090808@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/11/2014 10:20 PM, Bruce Momjian wrote:
>
> Uh, was this fixed. I see a cleanup commit for this C file, but this
> report is from June:
>
> commit 07a4a93a0e35a778c77ffbbbc18de29e859e18f0
> Author: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
> Date: Fri May 16 09:47:50 2014 +0300
>
> Initialize tsId and dbId fields in WAL record of COMMIT PREPARED.
>
> Commit dd428c79 added dbId and tsId to the xl_xact_commit struct but missed
> that prepared transaction commits reuse that struct. Fix that.
>
> Because those fields were left unitialized, replaying a commit prepared WAL
> record in a hot standby node would fail to remove the relcache init file.
> That can lead to "could not open file" errors on the standby. Relcache init
> file only needs to be removed when a system table/index is rewritten in the
> transaction using two phase commit, so that should be rare in practice. In
> HEAD, the incorrect dbId/tsId values are also used for filtering in logical
> replication code, causing the transaction to always be filtered out.
>
> Analysis and fix by Andres Freund. Backpatch to 9.0 where hot standby was
> introduced.

No, that was a different issue.

(more below)

> On Mon, Jun 30, 2014 at 11:58:59AM +0200, Andres Freund wrote:
>> Hi,
>>
>> I've just rerun valgrind for the first time in a while and saw the
>> following splat. My guess is it exists since bb38fb0d43c, but that's
>> blindly guessing:
>>
>> ==2049== Use of uninitialised value of size 8
>> ==2049== at 0x4FE66D: EndPrepare (twophase.c:1063)
>> ==2049== by 0x4F231B: PrepareTransaction (xact.c:2217)
>> ==2049== by 0x4F2A38: CommitTransactionCommand (xact.c:2676)
>> ==2049== by 0x79013E: finish_xact_command (postgres.c:2408)
>> ==2049== by 0x78DE97: exec_simple_query (postgres.c:1062)
>> ==2049== by 0x791FDD: PostgresMain (postgres.c:4010)
>> ==2049== by 0x71B13B: BackendRun (postmaster.c:4113)
>> ==2049== by 0x71A86D: BackendStartup (postmaster.c:3787)
>> ==2049== by 0x71714C: ServerLoop (postmaster.c:1566)
>> ==2049== by 0x716804: PostmasterMain (postmaster.c:1219)
>> ==2049== by 0x679405: main (main.c:219)
>> ==2049== Uninitialised value was created by a stack allocation
>> ==2049== at 0x4FE16C: StartPrepare (twophase.c:942)
>> ==2049==
>> ==2049== Syscall param write(buf) points to uninitialised byte(s)
>> ==2049== at 0x5C69640: __write_nocancel (syscall-template.S:81)
>> ==2049== by 0x4FE6AE: EndPrepare (twophase.c:1064)
>> ==2049== by 0x4F231B: PrepareTransaction (xact.c:2217)
>> ==2049== by 0x4F2A38: CommitTransactionCommand (xact.c:2676)
>> ==2049== by 0x79013E: finish_xact_command (postgres.c:2408)
>> ==2049== by 0x78DE97: exec_simple_query (postgres.c:1062)
>> ==2049== by 0x791FDD: PostgresMain (postgres.c:4010)
>> ==2049== by 0x71B13B: BackendRun (postmaster.c:4113)
>> ==2049== by 0x71A86D: BackendStartup (postmaster.c:3787)
>> ==2049== by 0x71714C: ServerLoop (postmaster.c:1566)
>> ==2049== by 0x716804: PostmasterMain (postmaster.c:1219)
>> ==2049== by 0x679405: main (main.c:219)
>> ==2049== Address 0x64694ed is 1,389 bytes inside a block of size 8,192 alloc'd
>> ==2049== at 0x4C27B8F: malloc (vg_replace_malloc.c:298)
>> ==2049== by 0x8E766E: AllocSetAlloc (aset.c:853)
>> ==2049== by 0x8E8E04: MemoryContextAllocZero (mcxt.c:627)
>> ==2049== by 0x8A54D3: AtStart_Inval (inval.c:704)
>> ==2049== by 0x4F1DFC: StartTransaction (xact.c:1841)
>> ==2049== by 0x4F28D1: StartTransactionCommand (xact.c:2529)
>> ==2049== by 0x7900A7: start_xact_command (postgres.c:2383)
>> ==2049== by 0x78DAF4: exec_simple_query (postgres.c:860)
>> ==2049== by 0x791FDD: PostgresMain (postgres.c:4010)
>> ==2049== by 0x71B13B: BackendRun (postmaster.c:4113)
>> ==2049== by 0x71A86D: BackendStartup (postmaster.c:3787)
>> ==2049== by 0x71714C: ServerLoop (postmaster.c:1566)
>> ==2049== Uninitialised value was created by a stack allocation
>> ==2049== at 0x4FE16C: StartPrepare (twophase.c:942)
>>
>> It's probably just padding - twophase.c:1063 is the CRC32 computation of
>> the record data.

Yeah. The padding bytes in TwoPhaseFileHeader were not initialized.

That's simple enough to fix, but when I run valgrind, I get a lot whole
bunch of similar messages. A few are from pgstat: the padding bytes in
the pgstat messages are not initialized. One comes from
write_relcache_init_file(); again I believe it's padding bytes being
uninitialized (in FormData_pg_attribute). And one from the XLogInsert
from heap_insert; there's an uninitialized padding byte in
xl_heap_insert. And so forth.. Is it worthwhile to hunt down all of
these? If there aren't many more than these, it probably is worth it,
but I fear this might be an endless effort. Have we been clean of these
warnings at any point in the past?

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-10-24 14:21:27 Re: pg_receivexlog --status-interval add fsync feedback
Previous Message Andres Freund 2014-10-24 14:10:55 Re: Deferring some AtStart* allocations?