Re: Bit datatype performance?

From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Antonio Vieiro <antonio(at)antonioshome(dot)net>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Bit datatype performance?
Date: 2011-09-14 17:32:08
Message-ID: bac7111715f2c9c982f3dd4ad78cbdc3@mail.softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I think it's not bad approach if performance is important. I don't know
how b-tree index will work with bitset datatype, but I assume it should
treat is as 256bit number (maybe someone more competive in internals
will answer this).

Please bear in mind, that this approach will work well only on query
You have written.

Because You ask on performance, I will add this topic You may want to
test and think about it

PG by default uses text transfer mode, so if you transfer your data
from/to server those will be transferred as 256 0/1 character string.
You may to think about storing tags as e.g. 4 long (64bit) fields, or 2
type 4 UUIDs (128bit) and use composite index. If you have ability to
use binary transfer and on your client side bitest will be mapped to
some "reasonable" type, then You won, otherwise (in binary mode) you
should get nice boost when you will store, those values in types I have
wrote.

Of course those are only some concepts, personally I have never made
such things.

Regards,
Radek

On Wed, 14 Sep 2011 17:58:58 +0200, Antonio Vieiro wrote:
> Hi again,
>
> Thanks for the tip. In fact I was thinking of creating an index on
> the bitmask, so I could use:
>
> ... where t.bits = :mymask
>
> directly, avoiding a full table scan. I assume this is possible
> (indexing bit and comparing bits), isn't it?
>
> Thanks,
> Antonio
>
> El 14/09/11 15:58, Radosław Smogura escribió:
>> On Wed, 14 Sep 2011 12:00:35 +0200, Antonio Vieiro wrote:
>>> Hi all,
>>>
>>> One of my entities 'E' may be 'tagged' with an arbitrary set of 256
>>> tags 'T'.
>>>
>>> A first approach could be to add a M:N relationship between 'E' and
>>> 'T'.
>>>
>>> A second way to do this could be to add a BIT(256) datatype to 'E',
>>> setting bits to '1' if the entity is tagged with each one of the
>>> 256
>>> tags (i.e. using a 'bitmask' on the set of tags).
>>>
>>> Since querying entities 'E' with a certain set of tags 'T' must be
>>> very fast I was wondering if the second approach would be faster.
>>> What
>>> do you think?
>>>
>>> Thanks for any hints,
>>> Antonio
>>
>> I assume each entity may have one or more different tags.
>>
>> Actually performing test like
>> ... where (t.bits & :mymask) = :mymask
>> should be quite fast and faster then creating additional relations,
>> but
>> only if it's highly probable that your query will almost always scan
>> whole table.
>>
>> The advantage of indexes is that the index is used 1st and tail
>> (slower)
>> parts of query will always get "subset" of table. In bitset, You
>> will
>> probably scan whole table.
>>
>> So I think, you should do some performance test for large number of
>> data, and compare both ways. I think bitset will be fast for really
>> small data, but M:N relations may be faster for really large data
>> sets.
>>
>> You need to measure size of your database too, in M:N case with 256
>> tags
>> it may be quite large.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Eduardo Piombino 2011-09-14 17:51:21 Re: Bit datatype performance?
Previous Message Rich Shepard 2011-09-14 17:20:31 Re: Identifying Reason for Column Name Returned by SELECT