Re: Bulk DML performance

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: bill(dot)poole(at)ymail(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Bulk DML performance
Date: 2025-03-13 11:28:09
Message-ID: d58bb88b06a40233aef8e07a39fd234458abc009.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, 2025-03-13 at 18:13 +0800, bill(dot)poole(at)ymail(dot)com wrote:
>
> it is noteworthy that inserting 27 MB of data into a newly created table creates
> 191 MB of data including the index and 127 MB of data excluding the index.

PostgreSQL has a lot of overhead per row.

>
> Can you help me understand why performing 3 million lookups on a b-tree index
> with all pages cached in memory takes so long?

It is probably not the lookup, but the *modification* of the index that is slow.

>
> It seems like deleting 3 million rows identified by 3 million IDs should be
> faster than updating 3 million rows (also identified by 3 million IDs).

It should be, yes.
To find out where the time is spent, use EXPLAIN (ANALYZE, BUFFERS) on the
statement.

> > To get better performance, the best I can think of is to parallelize loading
> > the data until you saturate CPU, disk or hit internal contention in the database.
>
> Sadly, I cannot do that because I need all rows to be inserted in a single
> database transaction, which I cannot do over multiple database connections.

Then the best you can do is to use COPY rather than INSERT.
It will perform better (but now vastly better).

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Laurenz Albe 2025-03-13 11:29:23 Re: Bulk DML performance
Previous Message bill.poole 2025-03-13 10:13:08 RE: Bulk DML performance