Re: pgsql: Generational memory allocator

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>
Subject: Re: pgsql: Generational memory allocator
Date: 2017-11-24 18:31:53
Message-ID: 16449.1511548313@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Tomas Vondra <tv(at)fuzzy(dot)cz> writes:
> I agree it would be useful to get the valgrind log from the animal, but
> I guess you'd need to get valgrind working to fix the issue anyway.
> Anyway, the attached patch fixes the issues for me - strictly speaking
> the Assert is not needed, but it doesn't hurt.

For me, this patch fixes the valgrind failures inside generation.c
itself, but I still see one more in the test_decoding run:

==00:00:00:08.768 15475== Invalid read of size 8
==00:00:00:08.768 15475== at 0x710BD0: SnapBuildProcessNewCid (snapbuild.c:780)
==00:00:00:08.768 15475== by 0x70291E: LogicalDecodingProcessRecord (decode.c:371)
==00:00:00:08.768 15475== by 0x705A04: pg_logical_slot_get_changes_guts (logicalfuncs.c:308)
==00:00:00:08.768 15475== by 0x6279F5: ExecMakeTableFunctionResult (execSRF.c:231)
==00:00:00:08.768 15475== by 0x634C41: FunctionNext (nodeFunctionscan.c:95)
==00:00:00:08.768 15475== by 0x626976: ExecScan (execScan.c:97)
==00:00:00:08.768 15475== by 0x622AF6: standard_ExecutorRun (executor.h:241)
==00:00:00:08.768 15475== by 0x76B07A: PortalRunSelect (pquery.c:932)
==00:00:00:08.768 15475== by 0x76C3E0: PortalRun (pquery.c:773)
==00:00:00:08.768 15475== by 0x7687EC: exec_simple_query (postgres.c:1120)
==00:00:00:08.768 15475== by 0x76A5C7: PostgresMain (postgres.c:4139)
==00:00:00:08.768 15475== by 0x6EFDC9: PostmasterMain (postmaster.c:4412)
==00:00:00:08.768 15475== Address 0xe7e846c is 28 bytes inside a block of size 34 client-defined
==00:00:00:08.768 15475== at 0x89A17F: palloc (mcxt.c:871)
==00:00:00:08.768 15475== by 0x51FF07: DecodeXLogRecord (xlogreader.c:1283)
==00:00:00:08.768 15475== by 0x520CD3: XLogReadRecord (xlogreader.c:475)
==00:00:00:08.768 15475== by 0x7059E4: pg_logical_slot_get_changes_guts (logicalfuncs.c:293)
==00:00:00:08.769 15475== by 0x6279F5: ExecMakeTableFunctionResult (execSRF.c:231)
==00:00:00:08.769 15475== by 0x634C41: FunctionNext (nodeFunctionscan.c:95)
==00:00:00:08.769 15475== by 0x626976: ExecScan (execScan.c:97)
==00:00:00:08.769 15475== by 0x622AF6: standard_ExecutorRun (executor.h:241)
==00:00:00:08.769 15475== by 0x76B07A: PortalRunSelect (pquery.c:932)
==00:00:00:08.769 15475== by 0x76C3E0: PortalRun (pquery.c:773)
==00:00:00:08.769 15475== by 0x7687EC: exec_simple_query (postgres.c:1120)
==00:00:00:08.769 15475== by 0x76A5C7: PostgresMain (postgres.c:4139)
==00:00:00:08.769 15475==

Not sure what to make of this: the stack traces make it look unrelated
to the GenerationContext changes, but if it's not related, how come
skink was passing before that patch went in?

I'm tempted to push Tomas' patch as-is, just to see if skink sees the
same failure I do.

BTW, I'm pretty certain that there are additional bugs in generation.c's
valgrind support, in code paths that the regression tests likely do not
reach. I do not think it's setting up valgrind correctly for "large"
chunks, and it looks to me like GenerationCheck would fail because it
tries to access chunk->requested_size which is generally kept NOACCESS.
I'm tempted to rip out all of the logic concerned with flipping
chunk->requested_size between normal and NOACCESS states, as that seems to
me like an exercise much more likely to introduce bugs than to catch any.

regards, tom lane

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2017-11-24 18:43:40 pgsql: Fix bug in generation.c's valgrind support.
Previous Message Andrew Dunstan 2017-11-24 15:35:43 Re: pgsql: Generational memory allocator