From: | Amit Langote <amitlangote09(at)gmail(dot)com> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | out of memory in crosstab() |
Date: | 2022-11-16 07:47:18 |
Message-ID: | CA+HiwqGsZhrvEFU_fK=zZ3UdVSgY7duRsGGqdJtoQPtCuhQ0jw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
A customer seems to have run into $subject. Here's a reproducer they shared:
CREATE TABLE test (id integer, category integer, rate numeric);
INSERT INTO test
SELECT x.id,
y.category,
random() * 10 AS rate
FROM generate_series(1, 1000000) AS x(id)
INNER JOIN generate_series(1, 25) AS y(category)
ON 1 = 1;
SELECT * FROM crosstab('SELECT id, category, rate FROM test ORDER BY
1, 2') AS final_result(id integer, "1" numeric, "2" numeric, "3"
numeric, "4" numeric, "5" numeric);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
Time: 106095.766 ms (01:46.096)
!?> \q
With the following logged:
LOG: server process (PID 121846) was terminated by signal 9: Killed
DETAIL: Failed process was running: SELECT * FROM crosstab('SELECT
id, category, rate FROM test ORDER BY 1, 2') AS final_result(id
integer, "1" numeric, "2" numeric, "3" numeric, "4" numeric, "5"
numeric);
The problem seems to be spi_printtup() continuing to allocate memory
to expand _SPI_current->tuptable to store the result of crosstab()'s
input query that's executed using:
/* Retrieve the desired rows */
ret = SPI_execute(sql, true, 0);
Note that this asks SPI to retrieve and store *all* result rows of the
query in _SPI_current->tuptable, and if there happen to be so many
rows, as in the case of above example, spi_printtup() ends up asking
for a bit too much memory.
The easiest fix for this seems to be for crosstab() to use open a
cursor (SPI_cursor_open) and fetch the rows in batches
(SPI_cursor_fetch) rather than all in one go. I have implemented that
in the attached. Maybe the patch should address other functions that
potentially have the same problem.
I also wondered about fixing this by making _SPI_current->tuptable use
a tuplestore that can spill to disk as its backing store rather than a
plain C HeapTuple array, but haven't checked how big of a change that
would be; SPI_tuptable is referenced in many places across the tree.
Though I suspect that idea has enough merits to give that a try
someday.
Thoughts on whether this should be fixed and the fix be back-patched?
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Teach-crosstab-to-use-SPI_cursor_-interface.patch | application/octet-stream | 13.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | houzj.fnst@fujitsu.com | 2022-11-16 08:19:48 | RE: Perform streaming logical transactions by background workers and parallel apply |
Previous Message | John Naylor | 2022-11-16 07:39:00 | Re: [PoC] Improve dead tuple storage for lazy vacuum |