Re: Fillfactor for GIN indexes

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fillfactor for GIN indexes
Date: 2015-07-21 09:40:00
Message-ID: 55AE1370.8090804@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Has anyone done any performance testing of this?

The purpose of a fillfactor is to avoid the storm of page splits right
after building the index, when there are some random updates to the
table. It causes the index to bloat, as every full page is split to two
half-full pages, and also adds latency to all the updates that have to
split pages.

As the code stands, do we actually have such a problem with GIN? Let's
investigate. Let's create a little test table:

create table foo (id int4, t text);
insert into foo select g, 'foo' from generate_series(1, 1000000) g;

And some indexes on it:

-- B-tree index on id, 100%
create index btree_100 on foo (id) with (fillfactor=100);
-- B-tree index on id, 90% (which is the default)
create index btree_90 on foo (id) with (fillfactor=90);
-- GIN index on id. Id is different on each row, so this index has no
posting trees, just the entry tree.
create index gin_id on foo using gin (id) with (fastupdate=off);
-- GIN index on t. T is the same on each row, so this index consists of
a single posting tree.
create index gin_t on foo using gin (t) with (fastupdate=off);

Immediately after creation, the index sizes are:

postgres=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+-----------+-------+--------+-------+---------+-------------
public | btree_100 | index | heikki | foo | 19 MB |
public | btree_90 | index | heikki | foo | 21 MB |
public | gin_id | index | heikki | foo | 53 MB |
public | gin_t | index | heikki | foo | 1072 kB |
(4 rows)

Now let's update 1% of the table, spread evenly across the table, and
see what happens to the index sizes:

postgres=# update foo set id = id where id % 100 = 0;
UPDATE 10000
postgres=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+-----------+-------+--------+-------+---------+-------------
public | btree_100 | index | heikki | foo | 39 MB |
public | btree_90 | index | heikki | foo | 21 MB |
public | gin_id | index | heikki | foo | 53 MB |
public | gin_t | index | heikki | foo | 1080 kB |
(4 rows)

As you can see, btree_100 index doubled in size. That's the effect we're
trying to avoid with the fillfactor, and indeed in the btree_90 index it
was avoided. However, the GIN indexes are not bloated either. Why is that?

The entry tree (demonstrated by the gin_id index) is not packed tightly
when it's built. If anything, we could pack it more tightly to make it
smaller to begin with. For the use cases where GIN works best, though,
the entry tree should be fairly small compared to the posting lists, so
it shouldn't make a big difference. For the posting tree, I think that's
because the way the items are packed in the segments. Even though we
pack the segments as tightly as possible, there is always some free
space left over on a page that's not enough to add another segment to it
at index build, but can be used by the updates to add items to the
existing segments. So in practice you always end up with some free space
on the posting tree pages, even without a fillfactor, so the "effective
fillfactor" even today is not actually 100%.

Based on this, I think we should just drop this patch. It's not useful
in practice.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Emre Hasegeli 2015-07-21 09:59:50 Re: [COMMITTERS] pgsql: Improve BRIN documentation somewhat
Previous Message Haribabu Kommi 2015-07-21 09:15:28 pg_hba_lookup function to get all matching pg_hba.conf entries