From: | Tim Landscheidt <tim(at)tim-landscheidt(dot)de> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | XMLPARSE() evaluated multiple times? |
Date: | 2009-07-20 13:18:46 |
Message-ID: | m3my6zsdah.fsf@passepartout.tim-landscheidt.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi,
I have been playing around with PostgreSQL's XML support
lately (cf.
<URI:news:m3ljmocolf(dot)fsf(at)passepartout(dot)tim-landscheidt(dot)de>)
and stumbled upon some performance issues related to
XMLPARSE(). In my "application", the XML document is supp-
lied as a string constant via a DBI ? parameter, for testing
purposes I have put it into a separate table:
| tim=# \timing
| Zeitmessung ist an.
| tim=# SELECT LENGTH(XMLasText) FROM tmpTestData;
| length
| --------
| 364446
| (1 Zeile)
| Zeit: 6,295 ms
| tim=# SELECT SUBSTRING(XMLPARSE(DOCUMENT XMLasText)::TEXT FROM 1 FOR 1) FROM tmpTestData;
| substring
| -----------
| <
| (1 Zeile)
| Zeit: 40,072 ms
| tim=#
(The SUBSTRING()s above and following are for reasons of
brevity only; the results are comparable when the raw XML is
queried.)
| tim=# SELECT G.A, SUBSTRING(XMLPARSE(DOCUMENT XMLasText)::TEXT FROM 1 FOR 1) FROM generate_series(1, 10) AS G(A), tmpTestData;
| a | substring
| ----+-----------
| 1 | <
| [...]
| 10 | <
| (10 Zeilen)
| Zeit: 416,069 ms
| tim=# SELECT G.A, SUBSTRING(XMLPARSE(DOCUMENT XMLasText)::TEXT FROM 1 FOR 1) FROM generate_series(1, 100) AS G(A), tmpTestData;
| a | substring
| -----+-----------
| 1 | <
| [...]
| 100 | <
| (100 Zeilen)
| Zeit: 3029,196 ms
| tim=# SELECT G.A, SUBSTRING(XMLPARSE(DOCUMENT XMLasText)::TEXT FROM 1 FOR 1) FROM generate_series(1, 1000) AS G(A), tmpTestData;
| a | substring
| ------+-----------
| 1 | <
| 1000 | <
| (1000 Zeilen)
| Zeit: 30740,626 ms
| tim=#
It seems that XMLPARSE() is called for every row without
PostgreSQL realizing that it is IMMUTABLE. This even seems
to be the case if the XMLPARSE() is part of a WHERE clause:
| tim=# SELECT G.A FROM generate_series(1, 10) AS G(A) WHERE G.A::TEXT = XMLPARSE(DOCUMENT (SELECT XMLasText FROM tmpTestData))::TEXT;
| a
| ---
| (0 Zeilen)
| Zeit: 240,626 ms
| tim=# SELECT G.A FROM generate_series(1, 100) AS G(A) WHERE G.A::TEXT = XMLPARSE(DOCUMENT (SELECT XMLasText FROM tmpTestData))::TEXT;
| a
| ---
| (0 Zeilen)
| Zeit: 2441,135 ms
| tim=# SELECT G.A FROM generate_series(1, 1000) AS G(A) WHERE G.A::TEXT = XMLPARSE(DOCUMENT (SELECT XMLasText FROM tmpTestData))::TEXT;
| a
| ---
| (0 Zeilen)
| Zeit: 25228,180 ms
| tim=#
Obviously, the "problem" can be circumvented by "caching"
the results of the XMLPARSE() in a temporary table (or even
a IMMUTABLE function?), but I would assume that this should
be PostgreSQL's task.
Any thoughts why this is not the case already? :-)
Tim
From | Date | Subject | |
---|---|---|---|
Next Message | Robert James | 2009-07-20 13:23:46 | Re: Fastest char datatype |
Previous Message | Marcin Stępnicki | 2009-07-20 12:22:03 | Re: Full text search with ORDER BY performance issue |