Quick Links

Re: When to use PARTITION BY HASH?

From:	Oleksandr Shulgin <oleksandr(dot)shulgin(at)zalando(dot)de>
To:	Michel Pelletier <pelletier(dot)michel(at)gmail(dot)com>
Cc:	pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject:	Re: When to use PARTITION BY HASH?
Date:	2020-06-03 11:55:25
Message-ID:	CACACo5RxgQBO_cB97yFkWwLo_-Vs3WtUei2KszGg2LkJ95uiXA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-performance

(sticking to pgsql-general)

On Tue, Jun 2, 2020 at 7:45 PM Michel Pelletier <pelletier(dot)michel(at)gmail(dot)com>
wrote:

>
> On Tue, Jun 2, 2020 at 10:17 AM Oleksandr Shulgin <
> oleksandr(dot)shulgin(at)zalando(dot)de> wrote:
>
>>
>> I was reading up on declarative partitioning[1] and I'm not sure what
>> could be a possible application of Hash partitioning.
>>
>> Is anyone actually using it? What are typical use cases? What benefits
>> does such a partitioning scheme provide?
>>
>> On its face, it seems that it can only give you a number of tables which
>> are smaller than the un-partitioned one, but I fail to see how it would
>> provide any of the potential advantages listed in the documentation.
>>
>
>

> From my point of view, hash partitioning is very useful for spreading out
> high insert/update load.
>

Do you also assign the partitions to different tablespaces as you've
hinted below or do you see performance improvement from partitioning
alone? How does that work? Does it give better results than using a RAID
to spread the disk IO, for example?

Yes its' true you end up with more smaller tables than one big large one,
> but remember the indexes are (often) tree data structures. Smaller trees
> are faster than bigger trees. By making the indexes smaller they are
> faster. Since the planner can knows to only examine the specific index it
> needs, this ends up being a lot faster.
>

That sounds logical, but can it be demonstrated? If the index(es) fit in
memory fully, it doesn't make a measurable difference, I guess?

With hash partitioning you are not expected, in general, to end up with a
small number of partitions being accessed more heavily than the rest. So
your indexes will also not fit into memory.

I have the feeling that using a hash function to distribute rows simply
contradicts the basic assumption of when you would think of partitioning
your table at all: that is to make sure the most active part of the table
and indexes is small enough to be cached in memory.

Regards,
--
Alex

In response to

Re: When to use PARTITION BY HASH? at 2020-06-02 17:45:12 from Michel Pelletier

Responses

Re: When to use PARTITION BY HASH? at 2020-06-04 14:32:42 from Jeff Janes
Re: When to use PARTITION BY HASH? at 2020-06-06 16:13:25 from Michel Pelletier

Browse pgsql-general by date

	From	Date	Subject
Next Message	Laurenz Albe	2020-06-03 12:05:44	Re: Replication conflicts despite hot_standby_feedback = on?
Previous Message	Julien Rouhaud	2020-06-03 11:41:23	Re: Replication conflicts despite hot_standby_feedback = on?

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Justin Pryzby	2020-06-03 15:09:59	Re: When to use PARTITION BY HASH?
Previous Message	Oleksandr Shulgin	2020-06-03 07:45:48	Re: When to use PARTITION BY HASH?