Storing and querying boolean fields

From: "Brown, Richard" <BrownRi(at)dnb(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Storing and querying boolean fields
Date: 2008-01-09 23:19:40
Message-ID: 84C1E648F506244C868CA8637C96A82402D4C5DD@DNBMSXBH003.dnbint.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi All,

First, some background:

- We are using PostgreSQL 7.3.4, and am locked into this version. I would
upgrade if I could, but the decision is not mine.
- The table referred to below is 120+ million rows, and has a width of 27
columns (15 smallints, 5 integers, 4 dates, 1 integer[], 1 single char text
field, and 1 double precision). This table already has 3 multi-field
indexes, as well as a primary key index.

I need to add many boolean flags to my database that need to be retrieved by
primary key, as well as be searched upon with decent speed. By "many", I
mean about 30 booleans total. The added bonus is that some of these values
need to be searched in an OR manner, while others need to be searched as an
AND. For example, I will need to do a search that looks something like:
select * from table where A && B && C && (D || E) && F

Our first thought was to "pack" these booleans into integer fields to save
space and help with searching (so we thought). The design calls for a
smallint field and an integer field to be added to the table. UPDATE/INSERT
would be handled by calculating the values of the two fields outside of the
database. Searching would be accomplished by taking the bits that were
requested, creating a bitmask corresponding to the 2 fields, and then doing
something like: select * from table where (field & BITMASK) = BITMASK. We
felt that putting this information in this table rather than a separate one
would help performance by eliminating a JOIN, and it was assumed (ugh) that
if we built an index on those two fields, that we would be able to use those
indexes for searching in the aforementioned manner.

Unfortunately we have come up with the following problems with this
approach:
1. Postgres will only use 1 index per table, so putting these fields in the
same table as several other searchable fields actually hurts us more than
helps.
2. We haven't been able to get any index scans in preliminary testing (all
seq. scans).

After all of that background, my actual question is: what is the best way
to effectively store and query this type of data? It seems more and more
that our initial design is just flat-out wrong, but we are at a loss as to
what the other options are.

Thanks very much in advance, and I apologize for the length of the message,
but I felt background info would be important.

Regards,
Richard

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Joshua D. Drake 2008-01-09 23:21:55 Re: Kernel kills postgres process - help need
Previous Message Scott Marlowe 2008-01-09 22:59:45 Re: Kernel kills postgres process - help need