From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Adam L Beberg <beberg(at)mithral(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Array indexes, GIN? |
Date: | 2007-03-02 19:50:38 |
Message-ID: | 1172865038.13722.205.camel@dogma.v10.wvs |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Thu, 2007-03-01 at 19:59 -0800, Adam L Beberg wrote:
> On the surface, looks like a job for GIN, but GIN seems undocumented,
> specifically mentions it doesn't support the deletes we'll have many of
> since it's designed for word searching apparently, the performance
It can delete an entry for one of the keys of an index, it just can't
delete the key itself when the number of entries goes down to zero.
Because you only have O(100K) possible keys, that shouldn't be a
problem. The GIN indexes can reclaim space. If they couldn't, they
wouldn't be nearly as useful.
The time when you run into problems is when you have a huge, sparsely
populated keyspace, with a huge number of keys contained by no tuples in
the table.
However, for your application, GIN still might not be the right answer.
GIN can only return tuples which do contain some matching keys, it won't
return the number of matching keys in that tuple (that's not the job of
an index).
Let's run some numbers:
* percentage of tuples returned = 100K rows out of the 10M = 1%
* tuples per page = 8192 bytes / 32 (tuple header) + 8 (bigint) + 80
(10 bigints) = ~70. Let's say it's 50 due to some free space.
Based on those numbers, the GIN index is basically going to say "get
every other page". PostgreSQL will optimize that into a sequential scan
because it makes no sense to do a random fetch for every other page.
So, the fastest way you can do this (that I can see) is just fetch every
tuple and count the number of matches in each array. You know your data
better than I do, so replace those numbers with real ones and see if it
still makes sense.
The basic rule is that an index scan is useful only if it reduces the
number of disk pages you need to fetch enough to make up for the extra
cost of random I/O.
Regards,
Jeff Davis
From | Date | Subject | |
---|---|---|---|
Next Message | Ron | 2007-03-02 20:28:26 | Re: strange performance regression between 7.4 and 8.1 |
Previous Message | Alex Deucher | 2007-03-02 19:43:05 | Re: strange performance regression between 7.4 and 8.1 |