Re: When to use PARTITION BY HASH?

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Oleksandr Shulgin <oleksandr(dot)shulgin(at)zalando(dot)de>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: When to use PARTITION BY HASH?
Date: 2020-06-02 17:47:12
Message-ID: 20200602174712.GX6680@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

Greetings,

Please don't cross post to multiple lists without any particular reason
for doing so- pick whichever list makes sense and post to that.

* Oleksandr Shulgin (oleksandr(dot)shulgin(at)zalando(dot)de) wrote:
> I was reading up on declarative partitioning[1] and I'm not sure what could
> be a possible application of Hash partitioning.

Yeah, I tend to agree with this.

> Is anyone actually using it? What are typical use cases? What benefits
> does such a partitioning scheme provide?

I'm sure folks are using it but that doesn't make it a good solution.

> On its face, it seems that it can only give you a number of tables which
> are smaller than the un-partitioned one, but I fail to see how it would
> provide any of the potential advantages listed in the documentation.

Having smaller tables can be helpful when it comes to dealing with
things like VACUUM (particularly since, even though we can avoid having
to scan the entire heap, we have to go through the indexes in order to
clean them up and generally larger tables have larger indexes),
however..

> With a reasonable hash function, the distribution of rows across partitions
> should be more or less equal, so I wouldn't expect any of the following to
> hold true:
> - "...most of the heavily accessed rows of the table are in a single
> partition or a small number of partitions."
> - "Bulk loads and deletes can be accomplished by adding or removing
> partitions...",
> etc.
>
> That *might* turn out to be the case with a small number of distinct values
> in the partitioning column(s), but then why rely on hash assignment instead
> of using PARTITION BY LIST in the first place?

You're entirely correct with this- there's certainly no small number of
situations where you end up with a 'hot' partition when using hashing
(which is true in other RDBMS's too, of course...) and that ends up
being pretty painful to deal with.

Also, you're right that you don't get to do bulk load/drop when using
hash partitioning, which is absolutely one of the largest benefits to
partitioning in the first place, so, yeah, their usefullness is.. rather
limited. Better to do your own partitioning based on actual usage
patterns that you know and the database's hash function certainly
doesn't.

Thanks,

Stephen

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message TALLURI Nareshkumar 2020-06-02 18:18:08 RE: LOG: could not send data to client: Broken pipe
Previous Message Michel Pelletier 2020-06-02 17:45:12 Re: When to use PARTITION BY HASH?

Browse pgsql-performance by date

  From Date Subject
Next Message Oleksandr Shulgin 2020-06-03 07:38:49 Re: When to use PARTITION BY HASH?
Previous Message Michel Pelletier 2020-06-02 17:45:12 Re: When to use PARTITION BY HASH?