From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Next Steps with Hash Indexes |
Date: | 2021-08-11 16:39:51 |
Message-ID: | CA+Tgmoa9kff2365gEW7Bhn2_twXcZMkCp5K8wfVE9iUVaaVYZg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Aug 11, 2021 at 11:17 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Yeah, agreed. The whole buckets-are-integral-numbers-of-pages scheme
> is pretty well designed to ensure bloat, but trying to ameliorate that
> by reducing the number of buckets creates its own problems (since, as
> you mention, we have no scheme whatever for searching within a bucket).
> I'm quite unimpressed with Simon's upthread proposal to turn off bucket
> splitting without doing anything about the latter issue.
Maybe. I don't think that it should be a huge problem to decide that
an occupied bucket has to consume an entire page; if that's a big
issue, you should just have fewer buckets. I do think it's a problem
that a bucket containing no tuples at all still consumes an entire
page, because the data is often going to be skewed so that many
buckets are entirely empty. I also think it's a problem that expanding
the directory from 2^{N} buckets to 2^{N+1} buckets requires 4
allocations of 2^{N-2} *consecutive* pages. That's a lot of
consecutive pages for even fairly modest values of N.
Imagine a design where we have a single-page directory with 1024
slots, each corresponding to one bucket. Each slot stores a block
number, which might be InvalidBlockNumber if there are no tuples in
that bucket. Given a tuple with hash value H, check slot H%1024 and
then go to that page to look further. If there are more tuples in that
bucket than can fit on the page, then it can link to another page. If
we assume for the sake of argument that 1024 is the right number of
buckets, this is going to use about as much space as the current
system when the data distribution is uniform, but considerably less
when it's skewed. The larger you make the number of buckets, the
better this kind of thing looks on skewed data.
Now, you can't just always have 1024 buckets, so we'd actually have to
do something a bit more clever, probably involving multiple levels of
directories. For example, suppose a directory page contains only 32
slots. That will leave a lot of empty space in the page, which can be
used to store tuples. An index search has to scan all tuples that are
stored directly in the page, and also use the first 5 bits of the hash
key to search the appropriate bucket. But the bucket is itself a
directory: it can have some tuples stored directly in the page, and
then it has 32 more slots and you use the next 5 bits of the hash key
to decide which one of those to search. Then it becomes possible to
incrementally expand the hash index: when the space available in a
directory page fills up, you can either create a sub-directory and
move as many tuples as you can into that page, or add an overflow page
that contains only tuples.
It's important to be able to do either one, because sometimes a bucket
fills up with tuples that have identical hash values, and sometimes a
bucket fills up with tuples that have a variety of hash values. The
current implementation tends to massively increase the number of
buckets even when it does very little to spread the index entries out.
("Hey, I doubled the number of buckets and the keys are still almost
all in one bucket ... let me double the number of buckets again and
see if it works better this time!") If we were going to create a
replacement, we'd want the index to respond differently to a bunch of
dups vs. a bunch of non-dups.
> As far as the specific point at hand is concerned, I think storing
> a hash value per index column, while using only the first column's
> hash for bucket selection, is what to do for multicol indexes.
> We still couldn't set amoptionalkey=true for hash indexes, because
> without a hash for the first column we don't know which bucket to
> look in. But storing hashes for the additional columns would allow
> us to check additional conditions in the index, and usually save
> trips to the heap on queries that provide additional column
> conditions. You could also imagine sorting the contents of a bucket
> on all the hashes, which would ease uniqueness checks.
That sounds reasonable I guess, but I don't know how hard it is to implement.
--
Robert Haas
EDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | John Naylor | 2021-08-11 17:11:22 | call popcount32/64 directly on non-x86 platforms |
Previous Message | Jonathan S. Katz | 2021-08-11 16:32:41 | 2021-08-12 release announcement draft |