Re: When to use PARTITION BY HASH?

From: Michel Pelletier <pelletier(dot)michel(at)gmail(dot)com>
To: Oleksandr Shulgin <oleksandr(dot)shulgin(at)zalando(dot)de>
Cc: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: When to use PARTITION BY HASH?
Date: 2020-06-02 17:45:12
Message-ID: CACxu=vLWZtmpfZqKBi9R4z7NTpd6iEKLXfEWGHxTYMECoTZSpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

On Tue, Jun 2, 2020 at 10:17 AM Oleksandr Shulgin <
oleksandr(dot)shulgin(at)zalando(dot)de> wrote:

> Hi!
>
> I was reading up on declarative partitioning[1] and I'm not sure what
> could be a possible application of Hash partitioning.
>
> Is anyone actually using it? What are typical use cases? What benefits
> does such a partitioning scheme provide?
>
> On its face, it seems that it can only give you a number of tables which
> are smaller than the un-partitioned one, but I fail to see how it would
> provide any of the potential advantages listed in the documentation.
>

I'm sure there will be many delightful answers to your question, and I look
forward to them! From my point of view, hash partitioning is very useful
for spreading out high insert/update load. Yes its' true you end up with
more smaller tables than one big large one, but remember the indexes are
(often) tree data structures. Smaller trees are faster than bigger trees.
By making the indexes smaller they are faster. Since the planner can knows
to only examine the specific index it needs, this ends up being a lot
faster.

Postgres can also parallelize queries on partitions. This is different
from a parallel sequential scan, which can also happen per-partition, so
there are multiple levels of parallel opportunity.

And last that I can think of, you can put the different partitions in
different tablespaces, improving the total IO bandwidth.

-Michel

> With a reasonable hash function, the distribution of rows across
> partitions should be more or less equal, so I wouldn't expect any of the
> following to hold true:
> - "...most of the heavily accessed rows of the table are in a single
> partition or a small number of partitions."
> - "Bulk loads and deletes can be accomplished by adding or removing
> partitions...",
> etc.
>
> That *might* turn out to be the case with a small number of distinct
> values in the partitioning column(s), but then why rely on hash
> assignment instead of using PARTITION BY LIST in the first place?
>
> Regards,
> --
> Alex
>
> [1] https://www.postgresql.org/docs/12/ddl-partitioning.html
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Stephen Frost 2020-06-02 17:47:12 Re: When to use PARTITION BY HASH?
Previous Message David G. Johnston 2020-06-02 17:43:02 Re: When to use PARTITION BY HASH?

Browse pgsql-performance by date

  From Date Subject
Next Message Stephen Frost 2020-06-02 17:47:12 Re: When to use PARTITION BY HASH?
Previous Message David G. Johnston 2020-06-02 17:43:02 Re: When to use PARTITION BY HASH?