From: | Eric Ridge <eebbrr(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Index bloat with "USING GIN(varchar[])" index? |
Date: | 2011-05-10 22:46:49 |
Message-ID: | BANLkTi=SNWAeFOKt+Pp2o0fN2eHNmOXNRw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
PostgreSQL 8.4.8 on i386-apple-darwin10.7.0, compiled by GCC
i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664),
64-bit
I'm not sure exactly what's going on, but I've got a table defined like this:
CREATE TABLE foo (
...
tags varchar(1000)[],
...
);
CREATE INDEX idxfoo_tags ON foo USING GIN (tags);
I query the tags column quite a bit like so:
SELECT * FROM foo WHERE tags @> ARRAY['value']::varchar[];
Works great. Super fast... usually.
"foo" has roughly 50k records, and each row has anywhere from 2 to 5
elements in "tags". "tags" gets rewritten pretty regularly across
large swaths of records (1k - ~10k at a time) with different-ish
values. What I see happing is the above SQL going from a few
milliseconds to a few seconds. Until I REINDEX it. Then it performs
well again until lots of rewrites happen.
Before or after the REINDEX, the query plan is always the same (and
it's a good plan):
explain analyze SELECT * FROM foo WHERE tags @> ARRAY['CATTLE']::varchar[];
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on foo (cost=8.59..147.76 rows=38 width=496)
(actual time=8.870..10.073 rows=1654 loops=1)
Recheck Cond: (tags @> '{CATTLE}'::character varying[])
-> Bitmap Index Scan on idxfoo_tags87 (cost=0.00..8.58 rows=38
width=0) (actual time=8.806..8.806 rows=5034 loops=1)
Index Cond: (tags @> '{CATTLE}'::character varying[])
Total runtime: 10.258 ms
(5 rows)
I haven't had a chance to nail down a standalone test case to
reproduce this, but it smells like index bloat.
Are GIN indexes known for bloating, especially if they're on a
varchar[]? Any suggestions for how to prove/disprove that it's index
bloat?
eric
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Kellerer | 2011-05-11 06:17:31 | Documentation suggestion |
Previous Message | Andreas Joseph Krogh | 2011-05-10 20:37:07 | Re: Table name as parameter |