Re: Excessive WAL generation and related performance issue

From: Jim Nasby <jim(at)nasby(dot)net>
To: Joe Conway <mail(at)joeconway(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: "Hackers (PostgreSQL)" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Excessive WAL generation and related performance issue
Date: 2014-04-14 23:04:37
Message-ID: 534C6985.6060406@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/14/14, 5:51 PM, Joe Conway wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 04/14/2014 03:17 PM, Jim Nasby wrote:
>> On 4/14/14, 4:50 PM, Andres Freund wrote:
>>> On 2014-04-14 14:33:03 -0700, Joe Conway wrote:
>>>> I realize there are many things that can be done to improve my
>>>> specific scenario, e.g. drop indexes before loading, change
>>>> various configs, etc. My purpose for this post is to ask if it
>>>> is really expected to get over 20 times as much WAL as heap
>>>> data?
>>>
>>> I'd bet a large percentage of this will be full page images of
>>> the index. The values you index are essentially distributed over
>>> the whole index, so you'll modifiy the same indx values
>>> repeatedly. But often enough it won't be in the same checkpoint
>>> and thus will create full page images.
>>
>> My thought exactly...
>>
>> ISTM that we should be able to push all the index inserts to the
>> end of the transaction. That should greatly reduce the amount of
>> full page writes. That would also open the door for doing all the
>> index inserts in parallel.
>
> That's the thing. I'm sure there is tuning and other things to improve
> this particular case, but creating over 20 times as much WAL as real
> data seems like pathological behavior to me.

Can you take a look at what's actually going into WAL when the wheels fall off? I think it should be pretty easy to test the theory that it's a ton of full page writes of index leaf pages...
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2014-04-14 23:11:05 Re: Including replication slot data in base backups
Previous Message Jim Nasby 2014-04-14 23:02:46 Re: Clock sweep not caching enough B-Tree leaf pages?