Re: Confusion about composite indexes

From: Bill Mitchell <bill(at)publicrelay(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Dmitriy Igrishin <dmitigr(at)gmail(dot)com>, General PostgreSQL List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Confusion about composite indexes
Date: 2012-05-21 21:41:37
Message-ID: 4FBAB691.3070504@publicrelay.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks to everybody's input -- as a first-time poster to this listserv,
I wasn't sure how long it would take to get a response. ;)

I was frankly astonished to see that the composite index on (a,b) was
used when I searched for (a), but Chris' response makes total sense.

In this case, I don't want to go with a MAP due to the fact that I'm
actually using Java Hibernate to generate this schema and access it.

My sample query of using RANDOM() to select a random subset of the
overall outlets was actually to try and defeat any prior caching of
results, and give a more reasonable measurement -- but I didn't realize
the implications. I had thought that coupled with a MAX clause at the
end it would simply randomize and then bail out early instead of a full
table scan - so thanks to Merlin for pointing that out.

I'll go with a 2nd index on MEDIA_ID and do some measurements of speed
increase, but it makes a lot more sense now.

thank you Postgres gurus! :D

regards
Bill

On 5/21/12 5:11 PM, Merlin Moncure wrote:
> On Mon, May 21, 2012 at 3:36 PM, Dmitriy Igrishin <dmitigr(at)gmail(dot)com> wrote:
>>> So you can get fully index lookups on all of a, b, ab, and ba. the
>>> primary key can't optimize ba because indexes only fully match if
>>> candidate fields are supplied from left to right order. They can
>>> still help somewhat, but to a lesser degree.
>> BTW, I would like to know is it worth it to create 3rd index on map(a)
>> to reduce the size of the index which will be used by the planer
>> to save some server's RAM (obviously, at the cost of extra disk space) ?
> What Dmitriy is talking about here is that even though an index on
> (a,b) can efficiently (in terms of searching through the tree) match
> terms on just 'a', you still pay a price because the entries on the
> index have to store the data for b as well, So even though it's
> algorithmically efficient you have to browse more data to do it which
> pressures RAM. In other words, an index on just 'a' is ideal for
> searches on just 'a', although a,b is much better than (b,a) or no
> index at all.
>
> I personally think that generally it's better not to do that in most
> cases especially if you're indexing integer keys since you're not
> making *that* much difference on the overall index size. Also,
> primary key indexes are much more likely to have to stay 'hot' in the
> cache anyways since they will be serving fkey reference lookups and
> stuff like that so in the end you might be consuming *more* ram, not
> less.
>
> An exception might be if your key on a,b has a very small 'a' and a
> very large 'b'. But that's pretty rare in practice and it's usually a
> good idea to avoid indexing large fields if you can help it. It
> really depends on the workload.
>
> merlin

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Merlin Moncure 2012-05-21 21:44:53 Re: how to for loop with distinct values?
Previous Message Merlin Moncure 2012-05-21 21:11:21 Re: Confusion about composite indexes