Re: How to do faster DML

From: veem v <veema0000(at)gmail(dot)com>
To: Lok P <loknath(dot)73(at)gmail(dot)com>
Cc: Greg Sabino Mullane <htamfids(at)gmail(dot)com>, dwhite(at)seawardmoon(dot)com, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>, Francisco Olarte <folarte(at)peoplecall(dot)com>, haramrae(at)gmail(dot)com, hjp-pgsql(at)hjp(dot)at
Subject: Re: How to do faster DML
Date: 2024-02-05 19:24:35
Message-ID: CAB+=1TUQ2OHRqfGupzJHGwNDAcWksR-F9fWzQBFpCU80iiwDTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I have not much experience with postgres on this scale though, others may
provide better suggestions. However, with this scale you will have a single
table size ~30TB+. And someone just pointed out in this thread, ~180
partitions/table as the good to have limit, and if that is true it would
be ~170GB per partition. Looks bulky for a system where readresponse time
is expected to be in milliseconds.

On Mon, 5 Feb 2024 at 16:51, Lok P <loknath(dot)73(at)gmail(dot)com> wrote:

>
>
> On Mon, 5 Feb, 2024, 2:21 am Lok P, <loknath(dot)73(at)gmail(dot)com> wrote:
>
>
> On Sun, Feb 4, 2024 at 9:18 PM Greg Sabino Mullane <htamfids(at)gmail(dot)com>
> wrote:
>
> Partitioning will depend a lot on how you access the data. Is it always
> using that transaction_date column in the where clause? Can you share the
> table definition via \d? We've talked about this table quite a bit, but not
> actually seen what it looks like. We are flying blind a little bit. You
> mentioned your queries are slow, but can you share an explain analyze on
> one of these slow queries?
>
> 45 billion rows is quite manageable. How many rows are in each day? You
> may want to do a larger bucket than 24 hours per partition.
>
> 1)Even after partitioning the target table , to speed up the data load on
> this table , Is there an option to disable the primary and foreign keys and
> re-enable them post data load finishes. Will that be a good idea
>
>
> No.
>
> 3)As the size of the table or each partition is going to be very large and
> this will be a read intensive application, compressing the historical
> partition will help us save the storage space and will also help the read
> queries performance
>
>
> I am not sure what this means. If you are not going to need the data
> anymore, dump the data to deep storage and drop the partition.
>
> Cheers,
> Greg
>
>
>
> Thank you.
>
> The table has ~127 columns of different data types , combinations of
> Numeric, varchar, date etc. And is having current size ~1TB holding
> ~3billion rows currently and the row size is ~300bytes.
>
> Currently it has lesser volume , but in future the daily transaction per
> day which will be inserted into this table will be Max ~500million
> rows/day. And the plan is to persist at least ~3months of transaction data
> which will be around 45billion rows in total. And if all works well , we
> may need to persist ~6 months worth of data in this database in future and
> that will be ~90 billion.
>
> This table will always be queried on the transaction_date column as one of
> the filters criteria. But the querying/search criteria can span from a day
> to a month worth of transaction date data.
>
> When you said "*You may want to do a larger bucket than 24 hours per
> partition.*", do you mean to say partition by weekly or so? Currently as
> per math i.e. 1TB of storage for ~3billion rows. So the daily range
> partition size( to hold ~500million transactions/day) will be around
> ~100-150GB. Won't that be too much data for a single partition to operate
> on, and increasing the granularity further(say weekly) will make the
> partition more bulkier?
>
> What I mean was, we will definitely need the data for querying purposes by
> the users, but just to keep the storage space incontrol (and to help the
> read queries), we were thinking of having the historical partitions
> compressed. And for that , if any compression strategy should we follow on
> postgres?
>
> With regards to loading data to the table faster, wondering why you said '
> *NO*' to load the data first and enabling/Creating the Primary key and
> Foreign key constraint later approach. Because this table is a child table
> and the parent is already having data in it, loading data to this table in
> presence of PK and FK makes it too slow as it tries to validate those for
> each set of rows. So we were thinking if doing it at a later stage at
> oneshot will be a faster approach. Please suggest.
>
> I will try to collect some SELECT query and post the explain analyze.
> Currently we are trying to get rid of the duplicates.
>
> Regards
> Lok
>
>
> Any thoughts, based on above usage pattern?
>
> While I did the maths based on the current stats with limited data sets.
> The partitions size coming as 100 to 150gb as I explained above, if we keep
> it daily range partition. Should we have to make it hourly then?
>
> So want some experts view if this going to work fine for a read latency
> intensive applications or we should have some different strategy?
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ron Johnson 2024-02-05 19:35:47 Re: Improving pg_dump performance when handling large numbers of LOBs
Previous Message Wyatt Tellis 2024-02-05 19:00:21 Improving pg_dump performance when handling large numbers of LOBs