Re: FILLFACTOR and increasing index

From: Boszormenyi Zoltan <zb(at)cybertec(dot)at>
To: pgsql-general(at)postgresql(dot)org
Cc: Tomas Vondra <tv(at)fuzzy(dot)cz>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject: Re: FILLFACTOR and increasing index
Date: 2011-06-12 19:14:41
Message-ID: 4DF51021.2020607@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

2011-05-12 00:28 keltezéssel, Tomas Vondra írta:
> Hi,
>
> I've studied the implementation of the btree indexes and how exactly the
> fillfactor is used, and in general
>
> - when a page split happens, the process needs to obtain more locks
> than with simple insert, which may result in contention with other
> processes that modify the index (the same pages)
>
> - the fillfactor is used only for the leaf pages, the rest of the index
> does not use it (directly)
>
> So lowering the number of page splits might remove some contention when
> there's a lot of processes accessing the same pages.
>
> But that's the theory - I really was not able to come up with a test
> that benefits from lower fillfactor. Actually the lower the fillfactor,
> the larger the index, which may be a significant issue with huge indexes.

We recently had a testcase for exercising FILLFACTOR on indexes.
Several (15+) GB raw data arrives daily and must be imported into
the database for analytic purposes, the table is heavily partitioned
and each partition has 5 or 6 indexes. The importer is heavily threaded
and uses COPY to insert the data. This is strictly an INSERT-only scenario,
the analysis comes later. This is where FILLFACTOR=70 helped to
reduce the index contention.

With the default 90% (and 100% on the table itself) when a lot of data
arrived in burst that were in the same time interval so 4-8 threads tried
to push data into the same partition, individual data chunks (about 10-15MB
each, about 63500 pieces) were imported in about 1.5-2.5 minutes, with
a few spikes over 4 minutes. This was the result of a few days long
network outage, the data collectors sent their backlogs.

When all indexes were re-created with FILLFACTOR=70, the import times
went down considerably. Out of about 63000 raw data chunks, there were
only 6 or 8 where the import time fell in the 10-25 seconds range, the rest
of them were imported under 10 seconds and the majority of them (65%)
were imported under 3 seconds. The system has 24 cores, so we can use
8 of them dedicated to importing. This almost 1TB data was imported
in about 3.5-4 hours after the FILLFACTOR change. Without it, it didn't
finish in 2 days.

Best regards,
Zoltán Böszörményi

> So in your case, I'd probably go with the default fillfactor (90), and
> maybe I'd consider even using fillfactor=100, to make the index as dense
> as possible.
>
> Anyway I guess the impact of this will be negligible, compared to other
> parameters (shared buffers, work_mem, etc.).
>
> regards
> Tomas
>

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
http://www.postgresql.at/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message pedz 2011-06-12 20:50:40 Out of tree build issue
Previous Message Vincent Veyron 2011-06-12 19:02:26 Re: Tweaking bytea / large object block sizes?