From: | Robin <robinstc(at)live(dot)co(dot)uk> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Could use some advice on search architecture |
Date: | 2014-04-19 12:10:48 |
Message-ID: | BLU0-SMTP1021DD054770E29AB1FA143E25C0@phx.gbl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
bottom post
On 19/04/2014 12:46, R. Pasch wrote:
> On 19-4-2014 9:38, Robin wrote:
>>
>> Well, given that there are known limited attributes, this is the type
>> of application that really really suits a column oriented database,
>> such as Sybase IQ (now sold by SAP). Its a neat product that scales.
>> Great performance with drag'n'drop analytics.
>>
>> Unless you can charm IQ out of SAP (it has been known to happen), you
>> might have to look at some other techniques
>>
>> So consider some binary data representation
>> Red - 1 (0000 0001)
>> Orange - 2 (0000 0010)
>> Yellow - 4 (0000 0100)
>> Green - 8 (0000 1000)
>> Blue - 16 (0001 0000)
>> Indigo - 32 (0010 0000)
>> Violet - 64 (0100 0000)
>>
>> This way, you can encode several colours in 1 value
>> Red or Green or Indigo = 1 + 8 + 32 = 41 = 0010 1001
>>
>>
>> Robin
>>
>>
> I stopped reading when I heard the word "sold by SAP" ;-) This project
> is solely build with open-source and freely available software.
>
> I've been thinking about using a binary data representation but didn't
> come to a solution to this specific problem quite yet. Per property of
> a product, only one bit would be 1 and the rest would be 0. What would
> a query look like to match all products that have a bit in the correct
> position?
>
> Say for instance these are a couple records (and yes, property values
> can be null as well)
>
> title, property1, property2, property3
> ================================
> product1, 0000 0001, 0000 0010, NULL
> product2, 0000 0100, 0100 0000, 0010 0000
> product3, 0010 0000, 0010 0000, 0100 0000
>
> Say that I would like to retrieve the products that either have
> property1 as 0010 0000, 1000 000 or 0000 0001. Combined that would be
> 0010 1001 and would have to match product1 and product3 as they both
> have their individual bit matching one of the bits being asked for.
> What would a where statement look like using this type of binary
> representation?
>
> If that would be fairly simple to do and fast (most important factor)
> then I could do an OR construction on all property columns and have
> something count the amount of properties that actually matched. Is
> that something you can do with a binary operator of some sort as well?
> Count the amount of overlapping bits?
>
> Say for instance I have a binary value of 0110 0101 and another binary
> value of 1100 0100, how could I found out how many bits matched? (in
> this case the number of matching bits would be 2)
>
I understand the reluctance to pay SAP-style rates, as a longtime DB
user, I have learned some 'charm' techniques.
However, I poked around a bit for alternatives, as I do like the
column-oriented approach, and found something called - /MonetDB /
<http://www.monetdb.com/Home/Features>- it apparently has a column-store
db kernel, and is open source - I suggest you have a look, if it does
what it says on the label, then it looks like a find.
There is a discussion of bitmask-trickiness here also dealing with
colours
<http://postgresql.1045698.n5.nabble.com/Bitmask-trickiness-td1865258.html>
Robin
<http://postgresql.1045698.n5.nabble.com/Bitmask-trickiness-td1865258.html>
From | Date | Subject | |
---|---|---|---|
Next Message | Dorian Hoxha | 2014-04-19 12:23:54 | Re: Could use some advice on search architecture |
Previous Message | Marc Mamin | 2014-04-19 11:49:23 | Re: Could use some advice on search architecture |