From: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | Neil Conway <neilc(at)samurai(dot)com> |
Cc: | pgsql-patches <pgsql-patches(at)postgresql(dot)org> |
Subject: | Re: hash index work |
Date: | 2005-05-28 02:42:28 |
Message-ID: | 200505280242.j4S2gSn09485@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-patches |
Neil, I have added these item to the TODO list. Do you plan on applying
this?
---------------------------------------------------------------------------
Neil Conway wrote:
> This patch implements two changes to hash indexes:
>
> - rather than storing the values of the index keys, we just store their
> hash value instead. The hash opclasses have been marked "lossy" so that
> the executor will recheck the scan qualifier to catch hash collisions.
>
> - within a page of a hash bucket, the entries are stored sorted by hash
> value. When doing a hash index scan, we can then binary search within a
> page rather than using a linear search.
>
> Unfortunately, I haven't been able to measure any performance
> improvement from either of these changes :-\
>
> I've run tests on two types of tables: int4 keys and relatively short
> textual keys (I don't think there's much point testing longer keys: the
> patches should obviously be a win there, so I'm concerned about speeding
> up the common case). In both cases, the difference in index scan
> performance isn't significant: it's about the same with and without the
> patches. The indexes have been on tables with 1 million rows (of int4)
> and 400,000 rows (of text). I would test with larger tables, but
> creating hash indexes is so slow that even these sizes are painful
> enough. Since indexes of this size should be cached in memory the
> reduction in I/O for the text case isn't being measured (the index is
> about 30% smaller than it is when we store the full text value), but
> even so I would have expected the binary search to produce noticeable
> gains...
>
> Perhaps I've made a coding error that has pessimized the performance
> somehow (although nothing obvious shows up in profiles), or perhaps the
> linear scan of the pages of the hash bucket isn't a performance problem
> in the first place, at least for types with a cheap equality operator.
> Some oprofile data seems to support this -- for example, in the int4
> case, we spend less than 0.5% of the time in _hash_next and children,
> and only 1.8% of the runtime in hashgetmulti and children (with the
> vanilla CVS HEAD code).
>
> -Neil
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2005-05-28 04:08:49 | Re: [HACKERS] patches for items from TODO list |
Previous Message | Bruce Momjian | 2005-05-28 02:35:13 | Re: AllocSetReset improvement |