Re: XML Index again

From: Chris Roffler <croffler(at)earthlink(dot)net>
To: Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: XML Index again
Date: 2010-03-08 12:23:32
Message-ID: 3984722a1003080423v64533a50xb00c30f6084b40cb@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Alban

thanks for your response. I understand what you are saying .....

>Your previous query wasn't about attributes in any specific position - it
returned documents that contained >more than zero attributes matching a
given name. What are you trying to do this time?

And that is exactly my problem, only if the first attribute in the
Attributes is a match the query returns successful.
Please see my example below

Thanks
Chris

CREATE INDEX xml_index
ON time_series
USING btree
((
(xpath('/AttributeList/Attributes/Attribute/Name/text()',
external_attributes))[1]::text));

SELECT id FROM time_series t1 WHERE EXISTS (
SELECT 1
FROM time_series t2
WHERE (xpath('/AttributeList/Attributes/Attribute/Name/text()',
external_attributes))[1]::text = 'xxxxx'
AND t2.id = t1.id
);

When I run the select it uses the index.

The first query I run is with 'xxxxx' = 'Attribute102021' this works like a
champ.
The second query is run with 'xxxx' = 'Attribute202021' and returns null. (
this is the second Attribute )

---- XML example

<AttributeList>
<Attributes>
<Attribute>
<Name>Attribute102021</Name>
<Value>111111111</Value>
<LastChanged>2010-03-05T05:44:36.796-05:00</LastChanged>
<ChangedBy>Chris</ChangedBy>
</Attribute>
<Attribute>
<Name>Attribute202021</Name>
<Value>222222222</Value>
<LastChanged>2010-03-05T05:44:36.796-05:00</LastChanged>
<ChangedBy>Maya</ChangedBy>
</Attribute>
</Attributes>
</AttributeList>

On Mon, Mar 8, 2010 at 1:27 PM, Alban Hertroys <
dalroi(at)solfertje(dot)student(dot)utwente(dot)nl> wrote:

> On 8 Mar 2010, at 11:39, Chris Roffler wrote:
>
> > Alban
> >
> > Thanks for your help, your suggestion worked.
> >
> > I need another xpath expression to find any Attribute with Name ="xxxx"
> under the Attributes node. (not just if it is in a specific position)
> > see query below.
>
> Your previous query wasn't about attributes in any specific position - it
> returned documents that contained more than zero attributes matching a given
> name. What are you trying to do this time?
>
> > How do I create an index for this xpath expression ?
>
> You don't need to create another index (although one w/o the
> external_attributes column would probably be more convenient); the index you
> have contains those names already.
>
> Just make sure you use the same expression you used to create the index to
> match the part in your xml and compare it to the text you're looking for.
>
> If you want to use indexes on your xml, then you'll need to stop putting
> the variable parts of your queries inside your xpath expressions - you make
> them unindexable that way. So move those [Name='xxxx']'s out of your xpath
> expressions. Instead have the expressions result in the names so that you
> can compare them to the names stored in your index(es).
>
> It won't be as fast as looking for those names using xpath in an xml
> document, as every attribute name is a candidate for comparison now, but at
> least it's indexable.
>
> Alternatively you could try to build an index from the names contained in
> each xml document. Something like:
>
> CREATE INDEX xml_attribute_names
> ON time_series
> USING btree (xpath('/AttributeList/Attributes/Attribute/Name/text()'));
>
> This stores the array of all attribute names in an index. You can query for
> existence of specific attributes using the ANY operator on the resulting
> array (see archives and docs for how to do that).
>
> I believe (I've never needed to use arrays) the syntax is:
>
> SELECT * FROM time_series WHERE 'xxxx' = ANY
> (xpath('/AttributeList/Attributes/Attribute/text()'));
>
> It'll probably be faster than the previous solution, but use more disk
> space and memory.
>
> > SELECT * FROM time_series
> > WHERE
> array_upper((xpath('/AttributeList/Attributes/Attribute[Name="xxxxx"]',
> external_attributes)),1) > 0
> >
> >
> > On Sun, Mar 7, 2010 at 1:11 PM, Alban Hertroys <
> dalroi(at)solfertje(dot)student(dot)utwente(dot)nl> wrote:
> ...
> > You seem to want to test for the existence of nodes with a specific name,
> maybe this is what you're looking for?:
> >
> > SELECT id FROM time_series t1 WHERE EXISTS (
> > SELECT 1
> > FROM time_series t2
> > WHERE (xpath('/AttributeList/Attributes/Attribute/Name/text()',
> external_attributes))[1]::text = ('Attribute122021', external_attributes)
> > AND t2.id = t1.id
> > );
>
> For clarity, if you would have an index on just that xpath expression -
> without the external_attributes column - this query would look simpler:
>
> SELECT id FROM time_series t1 WHERE EXISTS (
> SELECT 1
> FROM time_series t2
> WHERE
> xpath('/AttributeList/Attributes/Attribute/Name/text()')[1]::text =
> 'Attribute122021'
> AND t2.id = t1.id
> );
>
> Alban Hertroys
>
> --
> If you can't see the forest for the trees,
> cut the trees and you'll see there is no forest.
>
>
> !DSPAM:1034,4b94df38296921956520267!
>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alban Hertroys 2010-03-08 12:35:25 Re: XML Index again
Previous Message Pavel Stehule 2010-03-08 12:18:16 Re: How to find out if row was modified by EXECUTE UPDATE ...