BUG #17896: xmlagg exponentially slower than string_agg equivalent

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: sf_postgresql(at)informationsoftworks(dot)com
Subject: BUG #17896: xmlagg exponentially slower than string_agg equivalent
Date: 2023-04-14 15:12:59
Message-ID: 17896-f4bf54f99b6a3089@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 17896
Logged by: Paul Wehr
Email address: sf_postgresql(at)informationsoftworks(dot)com
PostgreSQL version: 14.6
Operating system: Debian GNU/Linux 11 (bullseye)
Description:

Not a bug in terms of functionality, but an interesting difference in
performance. If nothing else hopefully documenting this work-around might
help anyone else that may be aggregating xmlelements on this scale.

Using xmlagg(xml) on 60,000 rows of xml takes 16.3s on my hardware. If I
replace "xmlagg(xml)" with "string_agg(xml::text)::xml", I can get the same
result in 0.051s. (see script below). I have confirmed this behavior on
other hardware, and in versions 11.19, 13.7 and 14.6.

Also, while string_agg() seems to scale linearly (O(n)), xmlagg() seems to
scale somewhat steeper (O(n^2)?). Running the same script on various second
parameter values in the generate_series() function produced these results:

| | string_agg()
Rows | xmlagg(), ms | work-around, ms
-------+----------------+------------------
10000 | 314.44 | 14.207
20000 | 1111.104 | 23.754
30000 | 2558.706 | 37.565
40000 | 5306.631 | 51.021
50000 | 10178.496 | 61.928
60000 | 16306.651 | 75.146
70000 | 23853.331 | 93.529
80000 | 32243.908 | 105.318

-------- psql test script output ----------------

select version();
version

-----------------------------------------------------------------------------------------------------------------------------
PostgreSQL 14.6 (Debian 14.6-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled
by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)

create temporary table xmlsample as
select xmlelement(name test, 'test') as xml
from generate_series(1,60000) as i (i)
;
SELECT 60000
explain analyze select xmlagg(xml) as xml from xmlsample;
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1034.10..1034.11 rows=1 width=32) (actual
time=16304.009..16304.010 rows=1 loops=1)
-> Seq Scan on xmlsample (cost=0.00..903.88 rows=52088 width=32)
(actual time=0.021..12.754 rows=60000 loops=1)
Planning Time: 0.351 ms
Execution Time: 16306.651 ms
(4 rows)

explain analyze select string_agg(xml::text, '')::xml as xml from
xmlsample;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1034.10..1034.12 rows=1 width=32) (actual
time=63.076..63.078 rows=1 loops=1)
-> Seq Scan on xmlsample (cost=0.00..903.88 rows=52088 width=32)
(actual time=0.010..5.591 rows=60000 loops=1)
Planning Time: 0.098 ms
Execution Time: 75.146 ms
(4 rows)

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2023-04-14 19:00:01 BUG #17897: Crash on assignment to array of constraint-less domain over composite type
Previous Message Eric Wilfried Ettien 2023-04-14 10:06:38 Re: BUG #17895: Bug concerning UTF-8