Index bloat with "USING GIN(varchar[])" index?

From: Eric Ridge <eebbrr(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Index bloat with "USING GIN(varchar[])" index?
Date: 2011-05-10 22:46:49
Message-ID: BANLkTi=SNWAeFOKt+Pp2o0fN2eHNmOXNRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

PostgreSQL 8.4.8 on i386-apple-darwin10.7.0, compiled by GCC
i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664),
64-bit

I'm not sure exactly what's going on, but I've got a table defined like this:

CREATE TABLE foo (
...
tags varchar(1000)[],
...
);
CREATE INDEX idxfoo_tags ON foo USING GIN (tags);

I query the tags column quite a bit like so:

SELECT * FROM foo WHERE tags @> ARRAY['value']::varchar[];

Works great. Super fast... usually.

"foo" has roughly 50k records, and each row has anywhere from 2 to 5
elements in "tags". "tags" gets rewritten pretty regularly across
large swaths of records (1k - ~10k at a time) with different-ish
values. What I see happing is the above SQL going from a few
milliseconds to a few seconds. Until I REINDEX it. Then it performs
well again until lots of rewrites happen.

Before or after the REINDEX, the query plan is always the same (and
it's a good plan):

explain analyze SELECT * FROM foo WHERE tags @> ARRAY['CATTLE']::varchar[];
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on foo (cost=8.59..147.76 rows=38 width=496)
(actual time=8.870..10.073 rows=1654 loops=1)
Recheck Cond: (tags @> '{CATTLE}'::character varying[])
-> Bitmap Index Scan on idxfoo_tags87 (cost=0.00..8.58 rows=38
width=0) (actual time=8.806..8.806 rows=5034 loops=1)
Index Cond: (tags @> '{CATTLE}'::character varying[])
Total runtime: 10.258 ms
(5 rows)

I haven't had a chance to nail down a standalone test case to
reproduce this, but it smells like index bloat.

Are GIN indexes known for bloating, especially if they're on a
varchar[]? Any suggestions for how to prove/disprove that it's index
bloat?

eric

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Kellerer 2011-05-11 06:17:31 Documentation suggestion
Previous Message Andreas Joseph Krogh 2011-05-10 20:37:07 Re: Table name as parameter