Re: pgsql: Generational memory allocator

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tv(at)fuzzy(dot)cz>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>
Subject: Re: pgsql: Generational memory allocator
Date: 2017-11-25 21:19:20
Message-ID: 21466.1511644760@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2017-11-25 14:50:41 -0500, Tom Lane wrote:
>> Meanwhile, skink has come back with a success as of 0f2458f, rendering
>> the other mystery even deeper --- although at least this confirms the
>> idea that the crashes we are seeing predate the generation.c patch.

> Hm, wonder whether that could be optimization dependent too.

Evidently :-(. I examined my compiler's assembly output for the
relevant line, snapbuild.c:780:

ReorderBufferAddNewTupleCids(builder->reorder, xlrec->top_xid, lsn,
xlrec->target_node, xlrec->target_tid,
xlrec->cmin, xlrec->cmax,
xlrec->combocid);

which is

movl 12(%rbx), %eax combocid
movq 16(%rbx), %rcx target_node.spcNode + dbNode
movq %rbp, %rdx
movl 24(%rbx), %r8d target_node.relNode
movq 56(%r12), %rdi
movq 28(%rbx), %r9 target_tid ... 8 bytes worth
movl (%rbx), %esi top_xid
movl %eax, 16(%rsp)
movl 8(%rbx), %eax cmax
movl %eax, 8(%rsp)
movl 4(%rbx), %eax cmin
movl %eax, (%rsp)
call ReorderBufferAddNewTupleCids

I've annotated the fetches from *rbx to match the data structure:

typedef struct xl_heap_new_cid
{
/*
* store toplevel xid so we don't have to merge cids from different
* transactions
*/
TransactionId top_xid;
CommandId cmin;
CommandId cmax;

/*
* don't really need the combocid since we have the actual values right in
* this struct, but the padding makes it free and its useful for
* debugging.
*/
CommandId combocid;

/*
* Store the relfilenode/ctid pair to facilitate lookups.
*/
RelFileNode target_node;
ItemPointerData target_tid;
} xl_heap_new_cid;

and what's apparent is that it's grabbing 8 bytes not just 6 from
target_tid. So the failure case must occur when the xl_heap_new_cid
WAL record is right at the end of the palloc'd record buffer and
valgrind notices that we're fetching padding bytes.

I suspect that gcc feels within its rights to do this because the
size/alignment of xl_heap_new_cid is a multiple of 4 bytes, so that
there ought to be 2 padding bytes at the end.

We could possibly prevent that by marking the whole xl_heap_new_cid
struct as packed/align(2), the same way ItemPointerData is marked,
but that would render accesses to all of its fields inefficient
on alignment-sensitive machines. Moreover, if we go that route,
we have a boatload of other xlog record formats with similar hazards.

Instead I propose that we should make sure that the palloc request size
for XLogReaderState->main_data is always maxalign'd. The existing
behavior in DecodeXLogRecord of palloc'ing it only just barely big
enough for the current record seems pretty brain-dead performance-wise
even without this consideration. Generally, if we need to enlarge
that buffer, we should enlarge it significantly, IMO.

BTW, that comment in the middle of xl_heap_new_cid about "padding
makes this field free" seems entirely wrong. By my count the
struct is currently 34 bytes, so if by padding it's talking about
maxalign alignment of the whole struct, that field is costing us 8 bytes
on maxalign-8 machines. There's certainly no internal alignment
that would make it free.

regards, tom lane

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2017-11-25 21:46:02 Re: pgsql: Generational memory allocator
Previous Message Tom Lane 2017-11-25 20:31:08 pgsql: Replace raw timezone source data with IANA's new compact format.