Re: Partitioning options

From: Marc Millas <marc(dot)millas(at)mokadb(dot)com>
To: Justin <zzzzz(dot)graf(at)gmail(dot)com>
Cc: Greg Sabino Mullane <htamfids(at)gmail(dot)com>, veem v <veema0000(at)gmail(dot)com>, sud <suds1434(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Partitioning options
Date: 2024-02-09 15:29:13
Message-ID: CADX_1abJB377Cgp9VZC2B6OyNxzqn+-pbSbM+oRx8O=OM7extA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Feb 8, 2024 at 10:25 PM Justin <zzzzz(dot)graf(at)gmail(dot)com> wrote:

> Hi Sud,
>
> Would not look at HASH partitioning as it is very expensive to add or
> subtract the number of partitions.
>
> Would probably look at a nested partitioning using customer ID using
> range or list of IDs then by transaction date, Its easy to add
> partitions and balance the partitions segments.
>

I'll not do that because, then, when getting rid of obsolete data, you
must delete a huge number of records, and vacuum each partition.
if partitioning by date, you will ease greatly the cleaning, by just
getting rid of obsolete partitions which is quite speedy.( no delete, no
vacuum, no index updates, ...)
Marc

> Keep in mind that SELECT queries being used on the partition must use the
> partitioning KEY in the WHERE clause of the query or performance will
> suffer.
>
> Suggest doing a query analysis before deploying partition to confirm the
> queries WHERE clauses matched the planned partition rule. I suggest that
> 80% of the queries of the executed queries must match the partition rule if
> not don't deploy partitioning or change all the queries in the
> application to match the partition rule
>
>
> On Thu, Feb 8, 2024 at 3:51 PM Greg Sabino Mullane <htamfids(at)gmail(dot)com>
> wrote:
>
>> Out of curiosity, As OP mentioned that there will be Joins and also
>>> filters on column Customer_id column , so why don't you think that
>>> subpartition by customer_id will be a good option? I understand List
>>> subpartition may not be an option considering the new customer_ids gets
>>> added slowly in the future(and default list may not be allowed) and also OP
>>> mentioned, there is skewed distribution of data for customer_id column.
>>> However what is the problem if OP will opt for HASH subpartition on
>>> customer_id in this situation?
>>>
>>
>> It doesn't really gain you much, given you would be hashing it, the
>> customers are unevenly distributed, and OP talked about filtering on the
>> customer_id column. A hash partition would just be a lot more work and
>> complexity for us humans and for Postgres. Partitioning for the sake of
>> partitioning is not a good thing. Yes, smaller tables are better, but they
>> have to be smaller targeted tables.
>>
>> sud wrote:
>>
>> 130GB of storage space as we verified using the "pg_relation_size"
>>> function, for a sample data set.
>>
>>
>> You might also want to closely examine your schema. At that scale, every
>> byte saved per row can add up.
>>
>> Cheers,
>> Greg
>>
>>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message veem v 2024-02-10 19:38:36 How should we design our tables and indexes
Previous Message Laurenz Albe 2024-02-09 11:59:37 Re: Multiple connections over VPN password fail error