Re: Slow search.. quite clueless

From: Alex Turner <armtuk(at)gmail(dot)com>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Yonatan Ben-Nes <da(at)canaan(dot)co(dot)il>, pgsql-general(at)postgresql(dot)org
Subject: Re: Slow search.. quite clueless
Date: 2005-09-20 18:12:29
Message-ID: 33c6269f05092011121d038541@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I"m by no means an expert on this, and perhaps someone with more knowledge
can help, but it looks to me like the planner estimate and the actual cost
are significantly different which to me means that an analyze is required,
or/and increase the stats on these tables would be usefull. Also I'm
wondering if you can avoid the dereference oid lookup by created the index
as keyword,product_id instead of just keyword.

Alex Turner
NetEconomist

On 9/20/05, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
>
> contrib/tsearch2 ( http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/)
> might works for you. It might because performance depends on
> cardinality of your keywords.
>
> Oleg
> On Tue, 20 Sep 2005, Yonatan Ben-Nes wrote:
>
> > Hi all,
> >
> > Im building a site where the users can search for products with up to 4
> > diffrent keywords which all MUST match to each product which found as a
> > result to the search.
> >
> > I got 2 tables (which are relevant to the issue :)), one is the product
> table
> > (5 million rows) and the other is the keyword table which hold the
> keywords
> > of each product (60 million rows).
> >
> > The scheme of the tables is as follows:
> >
> > Table "public.product"
> > Column | Type | Modifiers
> > ----------------------------+---------------+---------------------
> > product_id | text | not null
> > product_name | text | not null
> > retail_price | numeric(10,2) | not null
> > etc...
> > Indexes:
> > "product_product_id_key" UNIQUE, btree (product_id)
> >
> > Table "public.keyword"
> > Column | Type | Modifiers
> > -------------+---------------+-----------
> > product_id | text | not null
> > keyword | text | not null
> > Indexes:
> > "keyword_keyword" btree (keyword)
> >
> > The best query which I succeded to do till now is adding the keyword
> table
> > for each keyword searched for example if someone search for "belt" &
> "black"
> > & "pants" it will create the following query:
> >
> > poweraise.com=# EXPLAIN ANALYZE SELECT
> >
> product_id,product_name,product_image_url,short_product_description,long_product_description,discount,discount_type,sale_price,retail_price
> > FROM product INNER JOIN keyword t1 USING(product_id) INNER JOIN keyword
> t2
> > USING(product_id) INNER JOIN keyword t3 USING(product_id) WHERE
> > t1.keyword='belt' AND t2.keyword='black' AND t3.keyword='pants' LIMIT
> 13;
> >
> > QUERY PLAN
> >
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > Limit (cost=37734.15..39957.20 rows=13 width=578) (actual
> > time=969.798..1520.354 rows=6 loops=1)
> > -> Hash Join (cost=37734.15..3754162.82 rows=21733 width=578) (actual
> > time=969.794..1520.337 rows=6 loops=1)
> > Hash Cond: ("outer".product_id = "inner".product_id)
> > -> Nested Loop (cost=18867.07..2858707.34 rows=55309 width=612)
> > (actual time=82.266..1474.018 rows=156 loops=1)
> > -> Hash Join (cost=18867.07..2581181.09 rows=55309 width=34)
> > (actual time=82.170..1462.104 rows=156 loops=1)
> > Hash Cond: ("outer".product_id = "inner".product_id)
> > -> Index Scan using keyword_keyword on keyword t2
> > (cost=0.00..331244.43 rows=140771 width=17) (actual time=0.033..1307.167
> > rows=109007 loops=1)
> > Index Cond: (keyword = 'black'::text)
> > -> Hash (cost=18851.23..18851.23 rows=6337 width=17)
> > (actual time=16.145..16.145 rows=0 loops=1)
> > -> Index Scan using keyword_keyword on keyword t1
> > (cost=0.00..18851.23 rows=6337 width=17) (actual time=0.067..11.050rows=3294
> > loops=1)
> > Index Cond: (keyword = 'belt'::text)
> > -> Index Scan using product_product_id_key on product
> > (cost=0.00..5.01 rows=1 width=578) (actual time=0.058..0.060 rows=1
> > loops=156)
> > Index Cond: (product.product_id = "outer".product_id)
> > -> Hash (cost=18851.23..18851.23 rows=6337 width=17) (actual
> > time=42.863..42.863 rows=0 loops=1)
> > -> Index Scan using keyword_keyword on keyword t3
> > (cost=0.00..18851.23 rows=6337 width=17) (actual time=0.073..36.120rows=3932
> > loops=1)
> > Index Cond: (keyword = 'pants'::text)
> > Total runtime: 1521.441 ms
> > (17 rows)
> >
> > Sometimes the query work fast even for 3 keywords but that doesnt help
> me if
> > at other times it take ages....
> >
> > Now to find a result for 1 keyword its really flying so I also tried to
> make
> > 3 queries and do INTERSECT between them but it was found out to be
> extremly
> > slow...
> >
> > Whats make this query slow as far as I understand is all the merging
> between
> > the results of each table... I tried to divide the keyword table into
> lots of
> > keywords table which each hold keywords which start only with a specific
> > letter, it did improve the speeds but not in a real significant way..
> tried
> > clusters,indexes,SET STATISTICS,WITHOUT OIDS on the keyword table and
> what
> > not.. im quite clueless...
> >
> > Actually I even started to look on other solutions and maybe you can say
> > something about them also.. maybe they can help me:
> > 1. Omega (From the Xapian project) - http://www.xapian.org/
> > 2. mnoGoSearch - http://www.mnogosearch.org/doc.html
> > 3. Swish-e - http://swish-e.org/index.html
> >
> > To add on everything I want at the end to be able to ORDER BY the
> results
> > like order the product by price, but im less concerned about that cause
> I saw
> > that with cluster I can do it without any extra overhead.
> >
> > Thanks alot in advance,
> > Yonatan Ben-Nes
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> > message can get through to the mailing list cleanly
> >
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Philip Hallstrom 2005-09-20 18:14:31 Re: Slow search.. quite clueless
Previous Message Yonatan Ben-Nes 2005-09-20 17:42:47 Slow search.. quite clueless