From: | Greg Sabino Mullane <htamfids(at)gmail(dot)com> |
---|---|
To: | bill(dot)poole(at)ymail(dot)com |
Cc: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Bulk DML performance |
Date: | 2025-03-17 14:14:14 |
Message-ID: | CAKAnmmKvxMG7t8pv8FZPkrgU9Sh8V=6Y8HaK4QymK-ZA+i+mcA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Mon, Mar 17, 2025 at 4:19 AM <bill(dot)poole(at)ymail(dot)com> wrote:
Can you help me understand why performing 3 million lookups on a b-tree
> index with all pages cached in memory takes so long?
It's not the lookup, it's writing the 3 million rows (and in this
particular upsert case, deleting 3 million, then inserting 3 million)
> Well, that is not a great statement.
>
> Understood, but I was highlighting the performance of deleting 3 million
> rows identified by 3 million IDs, as opposed to deleting rows in a given
> range of IDs or deleting the whole table. It seems like deleting 3 million
> rows identified by 3 million IDs should be faster than updating 3 million
> rows (also identified by 3 million IDs).
>
It should indeed be faster. But keep in mind a delete immediately after
that upsert now has twice as many rows to walk through as the upsert did.
Also, a subselect like your original query can lead to a large nested loop.
Try another variant such as this one:
with ids as (select x from generate_series(1, 3_000_000) x) delete from
test using ids where id=x;
> With the table as it is you won't get better performance if you want the
> features that a relational database provides.
>
> Sorry to hear that. I had hoped there was room to improve this performance.
>
If pure upsert performance is the goal, remove the unique index and store a
timestamp along with your inserted data. Back to pure inserts again! (and a
few new downsides). When querying, only use the version of the row with the
highest timestamp.
Other random ideas:
* remove or consolidate columns you don't need, or can store in another
table
* pre-filter the rows in the app, so you can do a pure-insert (or COPY) of
known-to-be-new rows, then upsert the remaining rows
* use the smallest data types possible
* avoid or minimize toasted values
* pack your columns efficiently (e.g. reorder for 8 byte blocks)
* put the indexes on a ram-based tablespace
* boost your work_mem (for things like giant deletes which build hashes)
* revisit unlogged tables and partitioning
Cheers,
Greg
--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support
From | Date | Subject | |
---|---|---|---|
Next Message | kimaidou | 2025-03-17 14:41:07 | Bad perf when using DECLARE CURSOR on big table |
Previous Message | Álvaro Herrera | 2025-03-17 10:53:37 | Re: Bulk DML performance |