Re: Could use some advice on search architecture

From: Robin <robinstc(at)live(dot)co(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Could use some advice on search architecture
Date: 2014-04-19 12:10:48
Message-ID: BLU0-SMTP1021DD054770E29AB1FA143E25C0@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

bottom post
On 19/04/2014 12:46, R. Pasch wrote:
> On 19-4-2014 9:38, Robin wrote:
>>
>> Well, given that there are known limited attributes, this is the type
>> of application that really really suits a column oriented database,
>> such as Sybase IQ (now sold by SAP). Its a neat product that scales.
>> Great performance with drag'n'drop analytics.
>>
>> Unless you can charm IQ out of SAP (it has been known to happen), you
>> might have to look at some other techniques
>>
>> So consider some binary data representation
>> Red - 1 (0000 0001)
>> Orange - 2 (0000 0010)
>> Yellow - 4 (0000 0100)
>> Green - 8 (0000 1000)
>> Blue - 16 (0001 0000)
>> Indigo - 32 (0010 0000)
>> Violet - 64 (0100 0000)
>>
>> This way, you can encode several colours in 1 value
>> Red or Green or Indigo = 1 + 8 + 32 = 41 = 0010 1001
>>
>>
>> Robin
>>
>>
> I stopped reading when I heard the word "sold by SAP" ;-) This project
> is solely build with open-source and freely available software.
>
> I've been thinking about using a binary data representation but didn't
> come to a solution to this specific problem quite yet. Per property of
> a product, only one bit would be 1 and the rest would be 0. What would
> a query look like to match all products that have a bit in the correct
> position?
>
> Say for instance these are a couple records (and yes, property values
> can be null as well)
>
> title, property1, property2, property3
> ================================
> product1, 0000 0001, 0000 0010, NULL
> product2, 0000 0100, 0100 0000, 0010 0000
> product3, 0010 0000, 0010 0000, 0100 0000
>
> Say that I would like to retrieve the products that either have
> property1 as 0010 0000, 1000 000 or 0000 0001. Combined that would be
> 0010 1001 and would have to match product1 and product3 as they both
> have their individual bit matching one of the bits being asked for.
> What would a where statement look like using this type of binary
> representation?
>
> If that would be fairly simple to do and fast (most important factor)
> then I could do an OR construction on all property columns and have
> something count the amount of properties that actually matched. Is
> that something you can do with a binary operator of some sort as well?
> Count the amount of overlapping bits?
>
> Say for instance I have a binary value of 0110 0101 and another binary
> value of 1100 0100, how could I found out how many bits matched? (in
> this case the number of matching bits would be 2)
>

I understand the reluctance to pay SAP-style rates, as a longtime DB
user, I have learned some 'charm' techniques.

However, I poked around a bit for alternatives, as I do like the
column-oriented approach, and found something called - /MonetDB /
<http://www.monetdb.com/Home/Features>- it apparently has a column-store
db kernel, and is open source - I suggest you have a look, if it does
what it says on the label, then it looks like a find.

There is a discussion of bitmask-trickiness here also dealing with
colours
<http://postgresql.1045698.n5.nabble.com/Bitmask-trickiness-td1865258.html>

Robin
<http://postgresql.1045698.n5.nabble.com/Bitmask-trickiness-td1865258.html>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dorian Hoxha 2014-04-19 12:23:54 Re: Could use some advice on search architecture
Previous Message Marc Mamin 2014-04-19 11:49:23 Re: Could use some advice on search architecture