From: | Oleksandr Shulgin <oleksandr(dot)shulgin(at)zalando(dot)de> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | Michel Pelletier <pelletier(dot)michel(at)gmail(dot)com>, pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org> |
Subject: | Re: When to use PARTITION BY HASH? |
Date: | 2020-06-05 10:11:54 |
Message-ID: | CACACo5T4Cq1sx7iWDxQ85RFG5dQxR1i7qS8BKR5JTaPwS-_MtA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-performance |
On Thu, Jun 4, 2020 at 4:32 PM Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Wed, Jun 3, 2020 at 7:55 AM Oleksandr Shulgin <
> oleksandr(dot)shulgin(at)zalando(dot)de> wrote:
>
> With hash partitioning you are not expected, in general, to end up with a
>> small number of partitions being accessed more heavily than the rest. So
>> your indexes will also not fit into memory.
>>
>> I have the feeling that using a hash function to distribute rows simply
>> contradicts the basic assumption of when you would think of partitioning
>> your table at all: that is to make sure the most active part of the table
>> and indexes is small enough to be cached in memory.
>>
>
> While hash partitioning doesn't appeal to me, I think this may be overly
> pessimistic. It would not be all that unusual for your customers to take
> turns being highly active and less active. Especially if you do occasional
> bulk loads all with the same customer_id for any given load, for example.
>
For a bulk load you'd likely want to go with an empty partition w/o indexes
and build them later, after loading the tuples. While it might not be
possible with any given partitioning scheme either, using hash partitioning
most certainly precludes that.
> So while you might not have a permanently hot partition, you could have
> partitions which are hot in turn. Of course you could get the same benefit
> (and probably better) with list or range partitioning rather than hash, but
> then you have to maintain those lists or ranges when you add new customers.
>
Why are LRU eviction from the shared buffers and OS disk cache not good
enough to handle this?
This actually applies to any partitioning scheme: the hot dataset could be
recognized by these caching layers. Does it not happen in practice?
--
Alex
From | Date | Subject | |
---|---|---|---|
Next Message | Ravi Krishna | 2020-06-05 10:14:42 | Re: Multitenent architecture |
Previous Message | Rob Sargent | 2020-06-05 10:09:10 | Re: Multitenent architecture |
From | Date | Subject | |
---|---|---|---|
Next Message | luis.roberto | 2020-06-05 11:16:42 | Re: Postgresql server gets stuck at low load |
Previous Message | Krzysztof Olszewski | 2020-06-05 10:07:02 | Postgresql server gets stuck at low load |