From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov> |
Cc: | Anton <antonin(dot)houska(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: Native XML |
Date: | 2011-03-01 19:46:29 |
Message-ID: | 4D6D4D15.9060206@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 03/01/2011 02:15 PM, Kevin Grittner wrote:
>
>>> Given that there were similar issues for other hierarchical data
>>> types, perhaps we need something similar to tsvector, but for
>>> hierarchical data. The extra layer of abstraction might not cost
>>> much when used for XML compared to the possible benefit with
>>> other data. It seems likely to be a very nice fit with GiST
>>> indexes.
>>>
>>> So under this idea, you would always have the text (or maybe byte
>>> array?) version of the XML, and you could "shard" it to a
>>> separate column for fast searches.
>
>> Tsearch should be able to handle XML now. It certainly knows how
>> to recognize XML tags.
>
> I apparently didn't express myself very well, since you seem to have
> *completely* missed my point. I know we can do tsearch2 searches
> against XML, or JSON, or YAML, or (insert next week's new favorite
> format here). What we can't currently do efficiently is search for
> particular values in some particular place in the hierarchy of a
> document. I've had loads of fun approximating it with regular
> expressions, but some days I'd like life to be easier.
>
> What I was arguing for is a new type which would represent the
> structure in a fashion which was independent of the particular text
> format and was efficient to traverse hierarchically. Done right,
> that would map well to GiST. Although, thinking about that some
> more, perhaps there would be a way to create a GiST index suitable
> for that straight from the XML text, and avoid the sharded column.
> A GiST index actually seems pretty close to what such a structure
> would look like anyway....
>
I probably didn't read your suggestion closely enough.
I think hierarchical data really only scratches the surface of the
problem. It would be nice to be able to specify all sorts of context for
searches:
* foo after bar
* foo near bar
* foo and bar in the same paragraph
* foo as a parent/child/ancestor/descendent/sibling/cousin of bar
cheers
andrew
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Hammond | 2011-03-01 19:46:38 | Re: mysql2pgsql.perl update |
Previous Message | Tom Lane | 2011-03-01 19:45:03 | Re: wrapping up this CommitFest (was Re: knngist - 0.8) |