Re: When to use PARTITION BY HASH?

From: Oleksandr Shulgin <oleksandr(dot)shulgin(at)zalando(dot)de>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Michel Pelletier <pelletier(dot)michel(at)gmail(dot)com>, pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: When to use PARTITION BY HASH?
Date: 2020-06-05 10:11:54
Message-ID: CACACo5T4Cq1sx7iWDxQ85RFG5dQxR1i7qS8BKR5JTaPwS-_MtA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

On Thu, Jun 4, 2020 at 4:32 PM Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Wed, Jun 3, 2020 at 7:55 AM Oleksandr Shulgin <
> oleksandr(dot)shulgin(at)zalando(dot)de> wrote:
>
> With hash partitioning you are not expected, in general, to end up with a
>> small number of partitions being accessed more heavily than the rest. So
>> your indexes will also not fit into memory.
>>
>> I have the feeling that using a hash function to distribute rows simply
>> contradicts the basic assumption of when you would think of partitioning
>> your table at all: that is to make sure the most active part of the table
>> and indexes is small enough to be cached in memory.
>>
>
> While hash partitioning doesn't appeal to me, I think this may be overly
> pessimistic. It would not be all that unusual for your customers to take
> turns being highly active and less active. Especially if you do occasional
> bulk loads all with the same customer_id for any given load, for example.
>

For a bulk load you'd likely want to go with an empty partition w/o indexes
and build them later, after loading the tuples. While it might not be
possible with any given partitioning scheme either, using hash partitioning
most certainly precludes that.

> So while you might not have a permanently hot partition, you could have
> partitions which are hot in turn. Of course you could get the same benefit
> (and probably better) with list or range partitioning rather than hash, but
> then you have to maintain those lists or ranges when you add new customers.
>

Why are LRU eviction from the shared buffers and OS disk cache not good
enough to handle this?

This actually applies to any partitioning scheme: the hot dataset could be
recognized by these caching layers. Does it not happen in practice?

--
Alex

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ravi Krishna 2020-06-05 10:14:42 Re: Multitenent architecture
Previous Message Rob Sargent 2020-06-05 10:09:10 Re: Multitenent architecture

Browse pgsql-performance by date

  From Date Subject
Next Message luis.roberto 2020-06-05 11:16:42 Re: Postgresql server gets stuck at low load
Previous Message Krzysztof Olszewski 2020-06-05 10:07:02 Postgresql server gets stuck at low load