Re: Could use some advice on search architecture

From: Marc Mamin <M(dot)Mamin(at)intershop(dot)de>
To: Andy Colson <andy(at)squeakycode(dot)net>, PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Could use some advice on search architecture
Date: 2014-04-19 17:36:18
Message-ID: B6F6FD62F2624C4C9916AC0175D56D8828A66565@jenmbs01.ad.intershop.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


>On 04/19/2014 06:26 AM, Ron Pasch wrote:
>> > - It should be possible to search for products and provide properties
>> > that the product SHOULD have, not must have.
>>>
>>> I don't understand this. Say you have a sprocket in red and green. Do you want to search for:
>>>
>>> select * from product where name = 'sprocket' and (color = 'red' or color = 'green')
>>>
>>> Or do you want something else? Does the user say they'd "prefer" blue, but will take whatever else you have?
>>>
>>> Do you search for some properties exactly and some "preferred"?
>>>
>>> Perhaps you could describe a little more how you want to query the database? Or, maybe, what your user's are searching for?
>>>
>>> -Andy
>>>
>>
>> Yes, the user can prefer certain properties and the products that match most of the properties should be in the top of the results, but if a product doesn't match all of them but just some of them, they should still be returned, but lower in the results.
>>
>> I'm seriously wondering if doing this solely with postgres is even possible without having long execution times. I've done some tests with 5 million records and just doing the "or" construction you mentioned above, which resulted in 600 to 900 ms queries and returning only those records of which all properties match at least one selected value.

I don't think that OR clauses are the right way as the aim is to count how many attributes do match the search.
basically a standard approach would look like:

SELECT pID, sum(match) as matches FROM
(
selct pID, 1 as match from Products where color ='pink'
UNION ALL
selct pID, 1 as match from Products where size ='XXL'
...
)foo
GROUP BY pID order by matches DESC

How many distinct attributes are involved ? ( 15 colors + 9 sizes + ....)

Marc

>>
>> I was thinking that perhaps using a search engine like lucene or sphinx would be more appropriate, but then I wonder what I would exactly be indexing and how I would be querying that, but that's a question for a different mailing list ;-)
>
>Please keep the list cc'd, so others can help as well.
>
>Yeah, doing a bunch of or's is gonna have to test all 5 million products.
>
>I wonder if there is a way we can treat this like a two step process.
>
>1) cut down the number of products
>
>2) sort them by #matches, popularity, etc
>
>You've talked about #2, but how about #1. Is there any way to either include or exclude a product? Users don't just ask for red, they ask for "tires (maybe red)". Not all 5 million products are tires, right?
>
>-Andy

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Torsten Förtsch 2014-04-19 20:51:08 Disable an index temporarily
Previous Message Andy Colson 2014-04-19 15:04:16 Re: Could use some advice on search architecture