Re: When to use PARTITION BY HASH?

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Ron <ronljohnsonjr(at)gmail(dot)com>
Cc: PostgreSQL General <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: When to use PARTITION BY HASH?
Date: 2020-06-08 21:25:12
Message-ID: CAApHDvqsGdqZR4hMi3W3POvhOSMLcEB+fPprY_OgDZUN8gb01Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

On Tue, 9 Jun 2020 at 01:07, Ron <ronljohnsonjr(at)gmail(dot)com> wrote:
>
> On 6/8/20 3:40 AM, Oleksandr Shulgin wrote:
> [snip]
>
> I've found the original commit adding this feature in version 11: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1aba8e651ac3e37e1d2d875842de1e0ed22a651e
> It says:
>
> "Hash partitioning is useful when you want to partition a growing data
> set evenly. This can be useful to keep table sizes reasonable, which
> makes maintenance operations such as VACUUM faster, or to enable
> partition-wise join."
>
>
> How does hashed (meaning "randomly?) distribution of records make partition-wise joins more efficient?

Hash partitioning certainly does not mean putting the tuple in some
random partition. It means putting the tuple in the partition with the
correct remainder value after dividing the hash value by the largest
partition modulus.

> Or -- since I interpret that as having to do with "locality of data" -- am I misunderstanding the meaning of "partition-wise joins"?

If it was not a partitioned table before then partition-wise joins
wouldn't be possible. Having partition-wise joins could make joining
two identically partitioned tables faster. We need only look in the
corresponding partition on the other side of the join for join
partners for each tuple. For hash joins, hash tables can be smaller,
which can mean not having to batch, and possibly having the hash table
fit better into a CPU cache. For merge joins, sorts, having the data
partially pre-sorted in chunks means fewer operations for qsort which
can result in speedups.

David

In response to

Browse pgsql-general by date

  From Date Subject
Next Message David G. Johnston 2020-06-08 22:10:42 Re: Is it possible to use keywords (date units) in a function definition?
Previous Message Alistair Johnson 2020-06-08 21:21:43 Is it possible to use keywords (date units) in a function definition?

Browse pgsql-performance by date

  From Date Subject
Next Message Krzysztof Olszewski 2020-06-09 11:52:55 Re: Postgresql server gets stuck at low load
Previous Message Ron 2020-06-08 13:07:04 Re: When to use PARTITION BY HASH?