From: | Csaba Nagy <nagy(at)domeus(dot)de> |
---|---|
To: | "'Felipe Schnack'" <felipes(at)ritterdosreis(dot)br>, "'pgsql-general'" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Efficient Boolean Storage |
Date: | 2002-12-04 17:02:06 |
Message-ID: | 96D568DD7FAAAD428581F8B3BFD9B0F604DE7D@goldmine.ecircle.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Well, I've had the same dilemma in what data type to use for efficient
boolean data storage.
My application has to do the following:
- store the state of max. 128 objects (this is more than the size of any of
the numeric types of postgres for which bitwise operators are defined);
- be able to use the state of any of these bits in a where clause. This
could be achieved by applying an AND mask and compare to 0.
- be able to have a variing nr. of bits.
Using 128 separate bit fields is not just insane but unpractical, because
these are not the only columns, there are plenty of others... also this
would mean to fix the nr. of bits, which is not good, as most of the time
the nr. of tracked objects will be below of the maximum of 128 possible.
Using bit varying type could be a solution, but here the problem is that you
can't apply bitwise operators to 2 operands with different bit length. This
would cause problems in queries, as I would need to apply the same bitmask
to rows with possibly different sizes of the bit field... in my case a
missing bit should mean false.
Furthermore, the bit variing type is decently documented, but the operators
you can apply to them are not.
Another problem is constructing a bit varying, which can be done only via
strings - this is at least unpleasant, to be forced to transform a number to
it's binary string representation to initialize the bit varying field with
it... if there is a way to do it directly with a number, it's undocumented.
Finally the solution was for me to convince the users they don't need more
than 64 bits :D , and use bigint.
But if in the future it turns out they need more, I will need better bit
level facilities than Postgres currently has...
BTW, is or is not using Postgres a string to store the bit data types ? I
have seen a few people asking this on the list, but no straight answer as of
yet... the docs really don't mention this aspect.
This makes a big difference for bit fields with lots of bits. 128 bits is 16
bytes, a 128 character string is at least 128 bytes... on 1 million records
this means about 100 MB difference in size, and also means slower data
transfer. If the string is toasted, I presume it means less difference in
storage, but still eats some more processing power and network bandwidth on
data transfer (unless the client-backend communications are compressed...
are they ?)
I hope I convinced everybody that the bit data type representation DOES make
a difference in those special cases where you have lots of bits...
Thanks for your attention,
Csaba.
-----Ursprüngliche Nachricht-----
Von: pgsql-general-owner(at)postgresql(dot)org
[mailto:pgsql-general-owner(at)postgresql(dot)org]Im Auftrag von Felipe Schnack
Gesendet: Mittwoch, 4. Dezember 2002 17:01
An: pgsql-general
Betreff: Re: [GENERAL] Efficient Boolean Storage
Why worry if a boolean isn't a bit in database storage?
When they are loaded to your CPU they will fill a register anyway (32
bits)
On Wed, 2002-12-04 at 14:01, Ericson Smith wrote:
> Hmmm...
>
> You can store them in an int4 type (will take up to 31)
>
> To store your numbers:
> $num = (2^$num1) + (2^$num2) ...
> where $num1 and $num2 are your bit positions that you want to set to
> "1",
>
> To retrieve them...
> SELECT * FROM table WHERE ((mycol & 2^1) != 0 OR (mycol & 2^3) != 0)
> Here you are checking for the 1st bit position and the 3rd bit position.
>
> - Ericson Smith
> eric(at)did-it(dot)com
>
> On Wed, 2002-12-04 at 10:11, Richard Huxton wrote:
> > On Wednesday 04 Dec 2002 1:06 am, Chris White wrote:
> > > Hello,
> > >
> > > I need to store many (e.g. 30) booleans and am wondering what is the
> > > most efficient way to store them. I can think of four options.
> > >
> > > Option 1...Use 'bool' data type
> > > No good since each value will require 1 byte rather than 1 bit.
> >
> > Depends what you're going to use them for. Are these 30 flags that
should be
> > grouped together? Will the users/app want all together or one at a time?
Will
> > they feature in WHERE clauses?
> >
> > --
> > Richard Huxton
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> > message can get through to the mailing list cleanly
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
--
Felipe Schnack
Analista de Sistemas
felipes(at)ritterdosreis(dot)br
Cel.: (51)91287530
Linux Counter #281893
Faculdade Ritter dos Reis
www.ritterdosreis.br
felipes(at)ritterdosreis(dot)br
Fone/Fax.: (51)32303328
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
From | Date | Subject | |
---|---|---|---|
Next Message | Matthew Gabeler-Lee | 2002-12-04 17:09:19 | Re: 7.3 no longer using indexes for LIKE queries |
Previous Message | Tom Lane | 2002-12-04 16:50:42 | Re: 7.3 no longer using indexes for LIKE queries |