Re: temp table on commit delete rows performance issue

From: Aleksander Alekseev <aleksander(at)timescale(dot)com>
To: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Floris Van Nee <florisvannee(at)optiver(dot)com>
Subject: Re: temp table on commit delete rows performance issue
Date: 2024-07-16 13:20:57
Message-ID: CAJ7c6TMBMr6-4rGCNwTXA0a5iAM6m3C9kHx65am+OQQuFYcKTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

> I'm looking for some input on an issue I've observed. A common pattern
> I've seen is using temporary tables to put data in before updating the
> real tables. Something roughly like:
>
> On session start:
> CREATE TEMP TABLE temp_t1 (...) ON COMMIT DELETE ROWS;
>
> On update:
> BEGIN;
> COPY temp_t1 FROM STDIN (FORMAT BINARY);
> INSERT INTO t1 (SELECT * FROM temp_t1 ...) ON CONFLICT DO UPDATE SET ...;
> -- potentially some other operations on temp table to put data into real table t1
> COMMIT;
>
> This pattern starts to break down under certain exceptional circumstances of
> high concurrency. The "ON COMMIT DELETE ROWS" does a truncate that is
> fairly expensive and doesn't work well in high-concurrency scenarios. It's
> especially noticeable under following circumstances:
> - high max_connections setting
> - high number of temp tables per session
> - concurrent writers at fairly short intervals
> Impact is on both TPS on primary as well as that the WAL replay process
> on replica becomes completely overloaded (100% cpu even though not
> a lot of WAL is being generated)
>
> [...]

I didn't investigate your particular issue but generally speaking
creating a table, even a temporary one, is an expensive operation.

Note that it's far from being a seperate file on the disk. It affects
catalog tables, shared buffers, all the corresponding locks, etc. If
you have indexes for a temporary table it makes the situation ever
worse. Sooner or later VACUUM will happen for your bloated catalog,
and this is not fun under heavy load.

Is there any particular reason why you don't want to simply change the
target table directly? If you do it in a transaction you are safe.

--
Best regards,
Aleksander Alekseev

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2024-07-16 13:24:07 Re: long-standing data loss bug in initial sync of logical replication
Previous Message Aleksander Alekseev 2024-07-16 13:09:44 [PATCH] Refactor pqformat.{c,h} and protocol.h