Re: Re: Is EXISTS the most efficient approach for PostgreSql to check for existence of nodes in a tree?

From: Seref Arikan <serefarikan(at)gmail(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Re: Is EXISTS the most efficient approach for PostgreSql to check for existence of nodes in a tree?
Date: 2012-05-18 11:57:00
Message-ID: CAG1bHGMGKnaSPoLxe2Az=zi6xjCMyN4hMYpGU1Dt30SqS8jEuw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Marti,
Thanks, this is exactly the kind of feedback I was looking for.
I am already storing the whole XML in a payload table actually. My problem
is, the queries are actually created in a domain specific langauge, and
then they are transformed to SQL.
There is a no way of knowing what kind of queries would be run over the XML
docs, so I'd be creating indices over and over with each incoming new
query. Still, I may be able to find a path in the middle, maybe using a
combination of the XML path based index and row based representation. Thaks
for the pointers to relevant index types.

Finally, ltree may be an alternative to xpath based indices, but I don't
know if that would be faster. The database may need to go beyond 100 milion
rows, and I'm not sure what would happen then with a row based
representation. I'll probably generate dummy data and compare performance
of the options.

Kind regards
Seref

On Fri, May 18, 2012 at 10:55 AM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:

> On Thu, May 17, 2012 at 8:40 PM, Seref Arikan <serefarikan(at)gmail(dot)com>
> wrote:
> > Is there a glaring error in my approach? Should I be better off with
> another
> > SQL query, or Ltree/XPATH queries?
>
> For the particular query you posted, I would suggest the following indexes:
>
> (rm_type_name, payload_id, parent_feature_mapping_id)
> And maybe:
> (rm_type_name, feature_name, payload_id, parent_feature_mapping_id)
>
> But overall, storing a hierarchical XML structure as rows in a table
> might not be the best approach. If performance is problematic, you
> might consider storing whole XML documents -- or fragments -- in an
> xml field and create expression indexes for the queries that you need,
> possibly with GIN/GiST.
>
> Now I haven't needed to do this myself, so what follows is just me
> trying out stuff to give you some ideas and certainly not "best
> practice" -- there are lots of different indexing strategies and
> different ways to do this.
>
> For example:
>
> CREATE TABLE foo (doc_id serial primary key, doc xml not null);
> CREATE INDEX foo_doc_id_exists_root_element_test ON foo (doc_id) WHERE
> xpath_exists('/root/element[text()="test"]', doc);
> CREATE INDEX foo_root_element_text_gin ON foo USING
> gin((xpath('/root/element/text()', doc)::text[]));
>
> To find documents which have <element>test</element>, using the above
> indexes:
>
> # explain analyze select * from foo where
> xpath_exists('/root/element[text()="test"]', doc);
> Bitmap Heap Scan on foo (cost=3.33..450.22 rows=4311 width=36)
> (actual time=0.025..0.026 rows=1 loops=1)
> Recheck Cond: xpath_exists('/root/element[text()="test"]'::text,
> doc, '{}'::text[])
> -> Bitmap Index Scan on foo_doc_id_exists_root_element_test
> (cost=0.00..2.26 rows=4311 width=0) (actual time=0.014..0.014 rows=1
> loops=1)
> Total runtime: 0.067 ms
>
> # explain analyze select * from foo where
> (xpath('/root/element/text()', doc)::text[]) @> array['test'];
> Bitmap Heap Scan on foo (cost=8.50..105.51 rows=65 width=32) (actual
> time=0.025..0.025 rows=1 loops=1)
> Recheck Cond: ((xpath('/root/element/text()'::text, doc,
> '{}'::text[]))::text[] @> '{test}'::text[])
> -> Bitmap Index Scan on foo_root_element_text_gin
> (cost=0.00..8.49 rows=65 width=0) (actual time=0.020..0.020 rows=1
> loops=1)
> Index Cond: ((xpath('/root/element/text()'::text, doc,
> '{}'::text[]))::text[] @> '{test}'::text[])
> Total runtime: 0.046 ms
> (5 rows)
>
> The GIN index lets you search for documents that have both "test" and
> "testing":
> (xpath('/root/element/text()', doc)::text[]) @> array['test','testing'];
>
> Regards,
> Marti
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message John Watts 2012-05-18 11:59:44 Re: difference in query plan when db is restored
Previous Message Piotr Ciechomski 2012-05-18 11:31:56 100% cpu usage on some postmaster processes kill the complete database