Re: maintenance_work_mem and CREATE INDEX time

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: maintenance_work_mem and CREATE INDEX time
Date: 2013-07-23 21:02:53
Message-ID: CAMkU=1xv_ZSVSMwmAc7wv=gSYWumMrpTHOkpUDtUTqkSpX8Mqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Tue, Jul 23, 2013 at 1:23 AM, Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> On Tue, Jul 23, 2013 at 1:11 PM, Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>> Hello,
>>
>> While understanding the effect of maintenance_work_mem on time taken
>> by CREATE INDEX, I observed that for the values of
>> maintenance_work_mem less than the value for which an internal sort is
>> performed, the time taken by CREATE INDEX increases as
>> maintenance_work_increases.
>>
>> My guess is that for all those values an external sort is chosen at
>> some point and larger the value of maintenance_work_mem, later the
>> switch to external sort would be made causing CREATE INDEX to take
>> longer. That is a smaller value of maintenance_work_mem would be
>> preferred for when external sort is performed anyway. Does that make
>> sense?
>>
>
> Upon further investigation, it is found that the delay to switch to
> external sort caused by a larger value of maintenance_work_mem is
> small compared to the total time of CREATE INDEX.

If you are using trace_sort to report that, it reports the switch as
happening as soon as it runs out of memory.

At point, all we have been doing is reading tuples into memory. The
time it takes to do that will depend on maintenance_work_mem, because
that affects how many tuples fit in memory. But all the rest of the
tuples need to be read sooner or later anyway, so pushing more of them
to later doesn't improve things overall it just shifts timing around.

After it reports the switch, it still needs to heapify the existing
in-memory tuples before the tapesort proper can begin. This is where
the true lost opportunities start to arise, as the large heap starts
driving cache misses which would not happen at all under different
settings.

Once the existing tuples are heapified, it then continues to use the
heap to pop tuples from it to write out to "tape", and to push newly
read tuples onto it. This also suffers lost opportunities.

Once all the tuples are written out and it starts merging, then the
large maintenance_work_mem is no longer a penalty as the new heap is
limited by the number of tapes, which is almost always much smaller.
In fact this stage will actually be faster, but not by enough to make
up for the earlier slow down.

So it is not surprising that the time before the switch is reported is
a small part of the overall time difference.

Cheers,

Jeff

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jeff Janes 2013-07-23 21:36:51 Re: process deadlocking on its own transactionid?
Previous Message Alvaro Herrera 2013-07-23 20:01:51 Re: odd locking behaviour

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2013-07-23 21:11:40 Re: Review: UNNEST (and other functions) WITH ORDINALITY
Previous Message Stephen Frost 2013-07-23 20:27:55 Re: Review: UNNEST (and other functions) WITH ORDINALITY