From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: crash with assertions and WAL_DEBUG |
Date: | 2014-06-26 13:21:45 |
Message-ID: | 53AC1E69.3030802@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 06/24/2014 05:47 PM, Alvaro Herrera wrote:
>>> diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
>>> index 3c1c81a..4264373 100644
>>> --- a/src/backend/storage/smgr/md.c
>>> +++ b/src/backend/storage/smgr/md.c
>>> @@ -219,6 +219,16 @@ mdinit(void)
>>> &hash_ctl,
>>> HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
>>> pendingUnlinks = NIL;
>>> +
>>> + /*
>>> + * XXX: The checkpointer needs to add entries to the pending ops
>>> + * table when absorbing fsync requests. That is done within a critical
>>> + * section. It means that there's a theoretical possibility that you
>>> + * run out of memory while absorbing fsync requests, which leads to
>>> + * a PANIC. Fortunately the hash table is small so that's unlikely to
>>> + * happen in practice.
>>> + */
>>> + MemoryContextAllowInCriticalSection(MdCxt, true);
>>> }
>>> }
>>
>> Isn't that allowing a bit too much? We e.g. shouldn't allow
>> _fdvec_alloc() within a crritical section. Might make sense to create a
>> child context for it.
>
> I agree.
Ok. That leaves nothing but _fdvec_alloc() in MdCxt.
> Rahila Syed wrote:
>
>> The patch on compilation gives following error,
>>
>> mcxt.c: In function ‘MemoryContextAllowInCriticalSection’:
>> mcxt.c:322: error: ‘struct MemoryContextData’ has no member named
>> ‘allowInCriticalSection’
>>
>> The member in MemoryContextData is defined as 'allowInCritSection' while
>> the MemoryContextAllowInCriticalSection accesses the field as
>> 'context->allowInCriticalSection'.
>
> It appears Heikki did a search'n replace for "->allowInCritSection"
> before submitting, which failed to match the struct declaration.
Uh, looks like I failed to test this at all with assertions enabled.
Once you fix that error, you get a lot of assertion failures :-(.
Attached is a new attempt:
* walDebugCxt is now created at backend startup (in XLOGShmemInit()).
This fixes the issue that Andres pointed out, that the context creation
would PANIC if the first XLogInsert in a backend happens within a
critical section. LWLOCK_STATS had the same issue, it would fail if the
first LWLock acquisition in a process happens within a critical section.
I fixed that by adding an explicit InitLWLockAccess() function and moved
the initialization there.
* hash_create also creates a new memory context within the given
context. That new context needs to inherit the allowInCritSection flag,
otherwise allocating entries in the hash will still fail. Both
LWLOCK_STATS and the pending ops hash table in the checkpointer had this
issue.
* Checkpointer does one palloc to allocate private space to copy the
requests to, in addition to using the hash table. I moved that palloc to
before entering the critical section.
- Heikki
Attachment | Content-Type | Size |
---|---|---|
alloc-in-crit-2.patch | text/x-diff | 15.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | MauMau | 2014-06-26 13:26:01 | Re: [Fwd: Re: proposal: new long psql parameter --on-error-stop] |
Previous Message | Andres Freund | 2014-06-26 12:49:42 | Re: better atomics - v0.5 |