Re: Hash partitioning.

From: "Yuri Levinsky" <yuril(at)celltick(dot)com>
To: "Heikki Linnakangas" <hlinnakangas(at)vmware(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Christopher Browne" <cbbrowne(at)gmail(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL Mailing Lists" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash partitioning.
Date: 2013-06-26 14:22:07
Message-ID: B72526FA2066E344AFD09734A487318103E92BD3@falcon1.celltick.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heiki,
This is most professional explanation that I ever seen. Let me please
disagree with a bottom line. It's heavily depends on amount of memory
and actual index sizes. I did a benchmark ~6 years ago and I won a glass
of beer. Anyway I am talking about hash partitioning as a feature and
my example about compare with unique b-tree index scan is little bit
extreme. In case you have 2,4,8..1024 different values (not known in
advance) the index might be eliminated. That's whole the feature: no
competition for hash function.

Sincerely yours,

Yuri Levinsky, DBA
Celltick Technologies Ltd., 32 Maskit St., Herzliya 46733, Israel
Mobile: +972 54 6107703, Office: +972 9 9710239; Fax: +972 9 9710222

-----Original Message-----
From: Heikki Linnakangas [mailto:hlinnakangas(at)vmware(dot)com]
Sent: Wednesday, June 26, 2013 5:10 PM
To: Yuri Levinsky
Cc: Tom Lane; Christopher Browne; Robert Haas; Bruce Momjian; PostgreSQL
Mailing Lists
Subject: Re: [HACKERS] Hash partitioning.

On 26.06.2013 16:41, Yuri Levinsky wrote:
> Heikki,
> As far as I understand the height of the btree will affect the number
> of I/Os necessary. The height of the tree does not increase linearly
> with the number of records.

The height of a b-tree is O(log n), where n is the number of records.
Informally, if we assume that you have on average, say, 1000 keys on one
b-tree page, a two-level b-tree can hold one million items, and a three
level one billion items, and so on. The height of the tree affects the
number of I/Os needed for searches, but keep in mind that the top levels
of the tree are going to be very frequently accessed and in practice
will stay permanently cached. You will only perform actual I/O on the
1-2 bottom levels of the tree (or 0 if it all fits in cache)

Now let's compare that with a hash partitioned table, with 1000
partitions and a b-tree index on every partition. When doing a search,
you first hash the key to look up the right partition, then you search
the index of that partition. This is almost equivalent to just having a
b-tree that's one level taller - instead of looking up the right
partition in the hash table, you look up the right child page at the
root of the b-tree. From a very coarse theoretical point of view, the
only difference is that you replaced the binary search on the b-tree
root page with an equivalent hash lookup. A hash lookup can be somewhat
cheaper than binary search, but in practice there is very little
difference. There certainly isn't any difference in the number of actual
I/O performed.

In practice, there might be a lot of quirks and inefficiencies and
locking contention etc. involved in various DBMS's, that you might be
able to work around with hash partitioning. But from a theoretical point
of view, there is no reason to expect just partitioning a table on a
hash to make key-value lookups any faster.

- Heikki

This mail was received via Mail-SeCure System.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2013-06-26 14:22:35 Developer meeting photos
Previous Message Bruce Momjian 2013-06-26 14:14:20 Re: Hash partitioning.