Re: Native XML

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Anton <antonin(dot)houska(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Native XML
Date: 2011-02-28 16:23:58
Message-ID: 4D6BCC1E.3010406@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02/28/2011 10:51 AM, Tom Lane wrote:
> Andrew Dunstan<andrew(at)dunslane(dot)net> writes:
>> xpath_table is severely broken by design IMNSHO. We need a new design,
>> but I'm reluctant to work on that until someone does LATERAL, because a
>> replacement would be much nicer to design with it than without it.
> Well, maybe I'm missing something, but I don't really understand why
> xpath_table's design is so unreasonable. Also, what would a better
> solution look like exactly? (Feel free to assume LATERAL is available.)
>

What's unreasonable about it is that the supplied paths are independent
of each other, and evaluated in the context of the entire XML document.

Let's take the given example in the docs, changed slightly to assume
each piece of XML can have more than one article listing in it (i.e,.
'article' is not the root node of the document):

SELECT * FROM
xpath_table('article_id',
'article_xml',
'articles',
'//article/author|//article/pages|//article/title',
'date_entered> ''2003-01-01'' ')
AS t(article_id integer, author text, page_count integer, title text);

There is nothing that says that the author has to come from the same
article as the title, nor is there any way of saying that they must. If
an article node is missing author or pages or title, or has more than
one where its siblings do not, they will line up wrongly.

An alternative would be to supply a single xpath expression that would
specify the context nodes to be iterated over (in this case that would
be '//article') and a set of xpath expressions to be evaluated in the
context of those nodes (in this case 'article|pages|title' ort better
yet, supply these as a text array). We'd produce exactly one row for
each node found by the context expression, and take the first value
found by each of the column expressions in that context (or we could
error out if we found more than one, or supply an array if the result
field is an array). So with LATERAL taking care of the rest, the
function signature could be something like:

xpath_table_new(
doc xml,
context_xpath text,
column_xpath text[])
returns setof record

Given this, you could not get a row with title and author from different
article nodes in the source document like you can now.

cheers

andrew

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-02-28 16:39:17 EXPLAIN doesn't show sufficient info for wCTE cases
Previous Message Robert Haas 2011-02-28 16:23:10 Re: Native XML