Re: When to use PARTITION BY HASH?

From: Imre Samu <pella(dot)samu(at)gmail(dot)com>
To: Oleksandr Shulgin <oleksandr(dot)shulgin(at)zalando(dot)de>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: When to use PARTITION BY HASH?
Date: 2020-06-05 11:48:19
Message-ID: CAJnEWw=hPeMiunN=XEkJM55mh+ytEkTXvGGoW8kbeZ8O4-+23g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

> "Bulk loads ...",

As I see - There is an interesting bulkload benchmark:

"How Bulkload performance is affected by table partitioning in PostgreSQL"
by Beena Emerson (Enterprisedb, December 4, 2019 )

*SUMMARY: This article covers how benchmark tests can be used to
demonstrate the effect of table partitioning on performance. Tests using
range- and hash-partitioned tables are compared and the reasons for their
different results are explained: 1. Range partitions
2. Hash partitions 3. Combination graphs
4. Explaining the behavior 5. Conclusion*

*"For the hash-partitioned table, the first value is inserted in the first
partition, the second number in the second partition and so on till all the
partitions are reached before it loops back to the first partition again
until all the data is exhausted. Thus it exhibits the worst-case scenario
where the partition is repeatedly switched for every value inserted. As a
result, the number of times the partition is switched in a
range-partitioned table is equal to the number of partitions, while in a
hash-partitioned table, the number of times the partition has switched is
equal to the amount of data being inserted. This causes the massive
difference in timing for the two partition types."*

https://www.enterprisedb.com/postgres-tutorials/how-bulkload-performance-affected-table-partitioning-postgresql

Regards,
Imre

Oleksandr Shulgin <oleksandr(dot)shulgin(at)zalando(dot)de> ezt írta (időpont: 2020.
jún. 2., K, 19:17):

> Hi!
>
> I was reading up on declarative partitioning[1] and I'm not sure what
> could be a possible application of Hash partitioning.
>
> Is anyone actually using it? What are typical use cases? What benefits
> does such a partitioning scheme provide?
>
> On its face, it seems that it can only give you a number of tables which
> are smaller than the un-partitioned one, but I fail to see how it would
> provide any of the potential advantages listed in the documentation.
>
> With a reasonable hash function, the distribution of rows across
> partitions should be more or less equal, so I wouldn't expect any of the
> following to hold true:
> - "...most of the heavily accessed rows of the table are in a single
> partition or a small number of partitions."
> - "Bulk loads and deletes can be accomplished by adding or removing
> partitions...",
> etc.
>
> That *might* turn out to be the case with a small number of distinct
> values in the partitioning column(s), but then why rely on hash
> assignment instead of using PARTITION BY LIST in the first place?
>
> Regards,
> --
> Alex
>
> [1] https://www.postgresql.org/docs/12/ddl-partitioning.html
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Achilleas Mantzios 2020-06-05 12:05:34 Re: Oracle vs. PostgreSQL - a comment
Previous Message David Rowley 2020-06-05 10:15:32 Re: Potential optimisation for the creation of a partial index condition over a newly created nullable, non-default-valued column?

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2020-06-05 13:51:29 Re: When to use PARTITION BY HASH?
Previous Message Pavel Stehule 2020-06-05 11:37:48 Re: Postgresql server gets stuck at low load