Re: crash with assertions and WAL_DEBUG

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: crash with assertions and WAL_DEBUG
Date: 2014-06-26 13:21:45
Message-ID: 53AC1E69.3030802@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06/24/2014 05:47 PM, Alvaro Herrera wrote:
>>> diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
>>> index 3c1c81a..4264373 100644
>>> --- a/src/backend/storage/smgr/md.c
>>> +++ b/src/backend/storage/smgr/md.c
>>> @@ -219,6 +219,16 @@ mdinit(void)
>>> &hash_ctl,
>>> HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
>>> pendingUnlinks = NIL;
>>> +
>>> + /*
>>> + * XXX: The checkpointer needs to add entries to the pending ops
>>> + * table when absorbing fsync requests. That is done within a critical
>>> + * section. It means that there's a theoretical possibility that you
>>> + * run out of memory while absorbing fsync requests, which leads to
>>> + * a PANIC. Fortunately the hash table is small so that's unlikely to
>>> + * happen in practice.
>>> + */
>>> + MemoryContextAllowInCriticalSection(MdCxt, true);
>>> }
>>> }
>>
>> Isn't that allowing a bit too much? We e.g. shouldn't allow
>> _fdvec_alloc() within a crritical section. Might make sense to create a
>> child context for it.
>
> I agree.

Ok. That leaves nothing but _fdvec_alloc() in MdCxt.

> Rahila Syed wrote:
>
>> The patch on compilation gives following error,
>>
>> mcxt.c: In function ‘MemoryContextAllowInCriticalSection’:
>> mcxt.c:322: error: ‘struct MemoryContextData’ has no member named
>> ‘allowInCriticalSection’
>>
>> The member in MemoryContextData is defined as 'allowInCritSection' while
>> the MemoryContextAllowInCriticalSection accesses the field as
>> 'context->allowInCriticalSection'.
>
> It appears Heikki did a search'n replace for "->allowInCritSection"
> before submitting, which failed to match the struct declaration.

Uh, looks like I failed to test this at all with assertions enabled.
Once you fix that error, you get a lot of assertion failures :-(.

Attached is a new attempt:

* walDebugCxt is now created at backend startup (in XLOGShmemInit()).
This fixes the issue that Andres pointed out, that the context creation
would PANIC if the first XLogInsert in a backend happens within a
critical section. LWLOCK_STATS had the same issue, it would fail if the
first LWLock acquisition in a process happens within a critical section.
I fixed that by adding an explicit InitLWLockAccess() function and moved
the initialization there.

* hash_create also creates a new memory context within the given
context. That new context needs to inherit the allowInCritSection flag,
otherwise allocating entries in the hash will still fail. Both
LWLOCK_STATS and the pending ops hash table in the checkpointer had this
issue.

* Checkpointer does one palloc to allocate private space to copy the
requests to, in addition to using the hash table. I moved that palloc to
before entering the critical section.

- Heikki

Attachment Content-Type Size
alloc-in-crit-2.patch text/x-diff 15.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message MauMau 2014-06-26 13:26:01 Re: [Fwd: Re: proposal: new long psql parameter --on-error-stop]
Previous Message Andres Freund 2014-06-26 12:49:42 Re: better atomics - v0.5