Re: How to do faster DML

From: Lok P <loknath(dot)73(at)gmail(dot)com>
To: Greg Sabino Mullane <htamfids(at)gmail(dot)com>
Cc: dwhite(at)seawardmoon(dot)com, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>, Francisco Olarte <folarte(at)peoplecall(dot)com>, haramrae(at)gmail(dot)com, hjp-pgsql(at)hjp(dot)at
Subject: Re: How to do faster DML
Date: 2024-02-06 03:57:38
Message-ID: CAKna9VbVVu9BiKxNvj=FVbpTkSxZoXWmqgNHvyfEdE+vnew8Bg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Feb 6, 2024 at 8:34 AM Greg Sabino Mullane <htamfids(at)gmail(dot)com>
wrote:

> On Sun, Feb 4, 2024 at 3:52 PM Lok P <loknath(dot)73(at)gmail(dot)com> wrote:
>
>> This table will always be queried on the transaction_date column as one
>> of the filters criteria. But the querying/search criteria can span from a
>> day to a month worth of transaction date data.
>>
>
> In that case, and based on the numbers you provided, daily partitioning
> seems a decent solution.
>
> What I mean was, we will definitely need the data for querying purposes by
>> the users, but just to keep the storage space incontrol (and to help the
>> read queries), we were thinking of having the historical partitions
>> compressed. And for that , if any compression strategy should we follow on
>> postgres?
>>
>
> There is no compression strategy, per se. There are ways Postgres
> internally compresses the data (see "TOAST"), but in general, the table is
> either there or not. If space is a concern you should examine if you truly
> need 127 columns, make sure they are arranged correctly (search for
> 'postgres column tetris'), and move unused and older partitions elsewhere -
> like a separate cheaper Postgres server, or something offline.
>
> With regards to loading data to the table faster, wondering why you said '
>> *NO*' to load the data first and enabling/Creating the Primary key and
>> Foreign key constraint later approach.
>>
>
> For an initial load, this is fine, if you do things very carefully. I'd
> build the child table indexes post load but still feed things into the main
> table as an initial tradeoff, but YMMV.
>
> Just looked back and saw this is actually Aurora, not Postgres. Most of
> the advice on this thread should still apply, but be aware that things are
> not the same as Postgres, especially at the storage layer. For all the
> money you are giving them, don't forget to bug them as well.
>
> Cheers,
> Greg
>
>
Thank you so much Greg. That helps.

So when you said *"In that case, and based on the numbers you provided,
daily partitioning seems a decent solution."*
, does it mean that size of the partition (which will be ~100-150GB per
partition here) is not an issue for the postgres to serve read latency
sensitive queries?

And yes I think you are spot on in regards to the storage. Actually we
loaded csv files which were having size summed to ~200GB and I was
wondering howcome the table size becomes ~1TB when that data is loaded to
the database. I was not aware about the "column tetris" and how postgres
padding additional spaces to the column data based on subsequent columns
type, still trying to digest the concept. Though we have many columns
with NULL values in them. Will try to reorder the column in the table and
hopefully , we will be able to really reduce the storage space through
that. Will raise a case too.

Regards
Lok

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Greg Sabino Mullane 2024-02-06 04:56:37 Re: How to do faster DML
Previous Message Greg Sabino Mullane 2024-02-06 03:04:18 Re: How to do faster DML