Re: How to do faster DML

From: Lok P <loknath(dot)73(at)gmail(dot)com>
To: dwhite(at)seawardmoon(dot)com
Cc: Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Greg Sabino Mullane <htamfids(at)gmail(dot)com>, pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>, Francisco Olarte <folarte(at)peoplecall(dot)com>, haramrae(at)gmail(dot)com, hjp-pgsql(at)hjp(dot)at
Subject: Re: How to do faster DML
Date: 2024-02-04 15:30:08
Message-ID: CAKna9VZx=7VU6x--qO=guhFuAORbSa-Zqfsy5Mu2xxgtV-0xuw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sun, Feb 4, 2024 at 8:14 PM Dennis White <dwhite(at)seawardmoon(dot)com> wrote:

> I'm surprised no one has mentioned perhaps it's a good idea to partition
> this table while adding the pk. By your own statements the table is
> difficult to work with as is. Without partitioning the table, row inserts
> would need to walk the pk index and could be a factor. If this is static
> table then perhaps that's ok but if not...
>
> Anyway I don't recall what type the ID was or how it's set but i suggest
> you seriously investigate using it to partition this table into manageable
> smaller tables.
> Best of luck.
>
> On Sun, Feb 4, 2024, 8:00 AM Marcos Pegoraro <marcos(at)f10(dot)com(dot)br> wrote:
>
>> insert into mytable2 select * from mytable1 order by ctid limit
>>> 10_000_000 offset 20_000_000;
>>>
>>
>> You can use min/max values grouping them by 10_000_000 records, so you
>> don´t need that offset, then generate commands and run them.
>>
>> select format('insert into mytable2 select * from mytable1 where i
>> between %s and %s;', max(i), min(i)) from t group by i/10_000_000;
>>
>
Thank you so much.

You are correct. It was seemingly difficult to operate on this table. Every
read query is struggling and so partitioning is something we must have to
think of. And hoping that, postgres will be able to handle this scale, with
proper partitioning and indexing strategy.

I have a few related questions.

1)Even after partitioning the target table , to speed up the data load on
this table , Is there an option to disable the primary and foreign keys and
re-enable them post data load finishes. Will that be a good idea or will it
be cumbersome/resource intensive to re-enable the constraints , after
persisting all the data in the table?

2)I understand there is no limitation theoretically on the number or size
of partitions a table can have in postgres. But I want to know from experts
here, from their real life experience, if there exists any such thing which
we should take care of before deciding the partitioning strategy, so as to
have the soft limit (both with respect to size and number of partitions)
obeyed.
Note:- Currently this table will be around ~1TB in size and will hold
Approx ~3billion rows(post removal of duplicates). But as per business need
it may grow up to ~45 billion rows in future.

3)As the size of the table or each partition is going to be very large and
this will be a read intensive application, compressing the historical
partition will help us save the storage space and will also help the read
queries performance. So, Can you please throw some light on the compression
strategy which we should follow here (considering a daily range partition
table based on transaction_date as partition key)?

Regards
Lok

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Greg Sabino Mullane 2024-02-04 15:47:45 Re: How to do faster DML
Previous Message Dennis White 2024-02-04 14:44:37 Re: How to do faster DML