Quick Links

Re: When to use PARTITION BY HASH?

From:	Oleksandr Shulgin <oleksandr(dot)shulgin(at)zalando(dot)de>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	Michel Pelletier <pelletier(dot)michel(at)gmail(dot)com>, pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject:	Re: When to use PARTITION BY HASH?
Date:	2020-06-05 10:11:54
Message-ID:	CACACo5T4Cq1sx7iWDxQ85RFG5dQxR1i7qS8BKR5JTaPwS-_MtA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-performance

On Thu, Jun 4, 2020 at 4:32 PM Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Wed, Jun 3, 2020 at 7:55 AM Oleksandr Shulgin <
> oleksandr(dot)shulgin(at)zalando(dot)de> wrote:
>
> With hash partitioning you are not expected, in general, to end up with a
>> small number of partitions being accessed more heavily than the rest. So
>> your indexes will also not fit into memory.
>>
>> I have the feeling that using a hash function to distribute rows simply
>> contradicts the basic assumption of when you would think of partitioning
>> your table at all: that is to make sure the most active part of the table
>> and indexes is small enough to be cached in memory.
>>
>
> While hash partitioning doesn't appeal to me, I think this may be overly
> pessimistic. It would not be all that unusual for your customers to take
> turns being highly active and less active. Especially if you do occasional
> bulk loads all with the same customer_id for any given load, for example.
>

For a bulk load you'd likely want to go with an empty partition w/o indexes
and build them later, after loading the tuples. While it might not be
possible with any given partitioning scheme either, using hash partitioning
most certainly precludes that.

> So while you might not have a permanently hot partition, you could have
> partitions which are hot in turn. Of course you could get the same benefit
> (and probably better) with list or range partitioning rather than hash, but
> then you have to maintain those lists or ranges when you add new customers.
>

Why are LRU eviction from the shared buffers and OS disk cache not good
enough to handle this?

This actually applies to any partitioning scheme: the hot dataset could be
recognized by these caching layers. Does it not happen in practice?

--
Alex

In response to

Re: When to use PARTITION BY HASH? at 2020-06-04 14:32:42 from Jeff Janes

Responses

Re: When to use PARTITION BY HASH? at 2020-06-05 13:51:29 from Jeff Janes

Browse pgsql-general by date

	From	Date	Subject
Next Message	Ravi Krishna	2020-06-05 10:14:42	Re: Multitenent architecture
Previous Message	Rob Sargent	2020-06-05 10:09:10	Re: Multitenent architecture

Browse pgsql-performance by date

	From	Date	Subject
Next Message	luis.roberto	2020-06-05 11:16:42	Re: Postgresql server gets stuck at low load
Previous Message	Krzysztof Olszewski	2020-06-05 10:07:02	Postgresql server gets stuck at low load