From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [POC] A better way to expand hash indexes. |
Date: | 2017-03-30 13:53:09 |
Message-ID: | CA+TgmobZ1qP_g8h-mOwT1RFh_OXfMZQa5g21Rhe6Tm27OzPueA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Mar 28, 2017 at 1:13 AM, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com> wrote:
> B. In tuple sort we can use hash function bucket = hash_key %
> num_buckets instead of existing one which does bitwise "and" to
> determine the bucket of hash key. This way we will not wrongly assign
> buckets beyond max_buckets and sorted hash keys will be in sync with
> actual insertion order of _hash_doinsert.
I think approach B is incorrect. Suppose we have 1536 buckets and
hash values 2048, 2049, 4096, 4097, 6144, 6145, 8192, and 8193. If I
understand correctly, each of these values should be mapped either to
bucket 0 or to bucket 1, and the goal of the sort is to put all of the
bucket 0 tuples before all of the bucket 1 tuples, so that we get
physical locality when inserting. With approach A, the sort keys will
match the bucket numbers -- we'll be sorting the list 0, 1, 0, 1, 0,
1, 0, 1 -- and we will end up doing all of the inserts to bucket 0
before any of the inserts to bucket 1. With approach B, we'll be
sorting 512, 513, 1024, 1025, 0, 1, 512, 513 and will end up
alternating inserts to bucket 0 with inserts to bucket 1.
To put that another way, see this comment at the top of hashsort.c:
* When building a very large hash index, we pre-sort the tuples by bucket
* number to improve locality of access to the index, and thereby avoid
* thrashing. We use tuplesort.c to sort the given index tuples into order.
So, you can't just decide to sort on a random number, which is what
approach B effectively does. Or, you can, but it completely misses
the point of sorting in the first place.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Peter J. Holzer | 2017-03-30 13:57:45 | Re: Postgres Permissions Article |
Previous Message | Daniel Gustafsson | 2017-03-30 13:48:02 | Typo in libpq |