Re: Native XML

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Anton <antonin(dot)houska(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Native XML
Date: 2011-03-01 19:46:29
Message-ID: 4D6D4D15.9060206@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/01/2011 02:15 PM, Kevin Grittner wrote:
>
>>> Given that there were similar issues for other hierarchical data
>>> types, perhaps we need something similar to tsvector, but for
>>> hierarchical data. The extra layer of abstraction might not cost
>>> much when used for XML compared to the possible benefit with
>>> other data. It seems likely to be a very nice fit with GiST
>>> indexes.
>>>
>>> So under this idea, you would always have the text (or maybe byte
>>> array?) version of the XML, and you could "shard" it to a
>>> separate column for fast searches.
>
>> Tsearch should be able to handle XML now. It certainly knows how
>> to recognize XML tags.
>
> I apparently didn't express myself very well, since you seem to have
> *completely* missed my point. I know we can do tsearch2 searches
> against XML, or JSON, or YAML, or (insert next week's new favorite
> format here). What we can't currently do efficiently is search for
> particular values in some particular place in the hierarchy of a
> document. I've had loads of fun approximating it with regular
> expressions, but some days I'd like life to be easier.
>
> What I was arguing for is a new type which would represent the
> structure in a fashion which was independent of the particular text
> format and was efficient to traverse hierarchically. Done right,
> that would map well to GiST. Although, thinking about that some
> more, perhaps there would be a way to create a GiST index suitable
> for that straight from the XML text, and avoid the sharded column.
> A GiST index actually seems pretty close to what such a structure
> would look like anyway....
>

I probably didn't read your suggestion closely enough.

I think hierarchical data really only scratches the surface of the
problem. It would be nice to be able to specify all sorts of context for
searches:

* foo after bar
* foo near bar
* foo and bar in the same paragraph
* foo as a parent/child/ancestor/descendent/sibling/cousin of bar

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Hammond 2011-03-01 19:46:38 Re: mysql2pgsql.perl update
Previous Message Tom Lane 2011-03-01 19:45:03 Re: wrapping up this CommitFest (was Re: knngist - 0.8)