Re: Performant queries on table with many boolean columns

From: Rob Imig <rimig88(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Performant queries on table with many boolean columns
Date: 2016-04-21 19:34:52
Message-ID: CANcrS5oHanMwESO0jaqROV44wQRNO6y_gXj8C4=uY27t3u7TfQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hey all,

Lots of interesting suggestions! I'm loving it.

Just came back to this a bit earlier today and made a sample table to see
what non-index performance would be. Constructed data just like above (used
12M rows and 80% true for all 100 boolean columns)

Here's an analyze for what I'd expect to be the types of queries that I'll
be handling from the frontend. I would expect around 40-70 properties per
query.

Now I'm going to start experimenting with some ideas above and other
tuning. This isn't as bad as I thought it would be, though would like to
get this under 200ms.

rimig=# explain analyze select count(*) from bloomtest where prop0 AND
prop1 AND prop2 AND prop3 AND prop4 AND prop5 AND prop6 AND prop7 AND prop8
AND prop9 AND prop10 AND prop11 AND prop12 AND prop13 AND prop14 AND prop15
AND prop16 AND prop17 AND prop18 AND prop19 AND prop20 AND prop21 AND
prop22 AND prop23 AND prop24 AND prop25 AND prop26 AND prop27 AND prop28
AND prop29 AND prop30 AND prop31 AND prop32 AND prop33 AND prop34 AND
prop35 AND prop36 AND prop37 AND prop38 AND prop39 AND prop40 AND prop41
AND prop42 AND prop43 AND prop44 AND prop45 AND prop46 AND prop47 AND
prop48 AND prop49 AND prop50 AND prop51 AND prop52 AND prop53 AND prop54
AND prop55 AND prop56 AND prop57 AND prop58 AND prop59 AND prop60 AND
prop61 AND prop62 AND prop63 AND prop64;

Aggregate (cost=351563.03..351563.04 rows=1 width=0) (actual
time=2636.829..2636.829 rows=1 loops=1)

-> Seq Scan on bloomtest (cost=0.00..351563.02 rows=3 width=0) (actual
time=448.200..2636.811 rows=9 loops=1)

Filter: (prop0 AND prop1 AND prop2 AND prop3 AND prop4 AND prop5
AND prop6 AND prop7 AND prop8 AND prop9 AND prop10 AND prop11 AND prop12
AND prop13 AND prop14 AND prop15 AND prop16 AND prop17 AND prop18 AND
prop19 AND prop20 AND prop21 AND prop22 AND prop23 AND prop24 AND prop25
AND prop26 AND prop27 AND prop28 AND prop29 AND prop30 AND prop31 AND
prop32 AND prop33 AND prop34 AND prop35 AND prop36 AND prop37 AND prop38
AND prop39 AND prop40 AND prop41 AND prop42 AND prop43 AND prop44 AND
prop45 AND prop46 AND prop47 AND prop48 AND prop49 AND prop50 AND prop51
AND prop52 AND prop53 AND prop54 AND prop55 AND prop56 AND prop57 AND
prop58 AND prop59 AND prop60 AND prop61 AND prop62 AND prop63 AND prop64)

Rows Removed by Filter: 11999991

Total runtime: 2636.874 ms

On Thu, Apr 21, 2016 at 12:45 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Wed, Apr 20, 2016 at 11:54 AM, Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:
> >>
> >> The obvious thing seems to make a table with ~100 columns, with 1 column
> >> for each boolean property. Though, what type of indexing strategy would
> >> one use on that table? Doesn't make sense to do BTREE. Is there a better
> >> way to structure it?
> >>
> > looks like a deal for contrib/bloom index in upcoming 9.6 release
>
> Not without doing a custom compilation with an increased INDEX_MAX_KEYS:
>
> ERROR: cannot use more than 32 columns in an index
>
> But even so, I'm skeptical this would do better than a full scan. It
> would be interesting to test that.
>
> Cheers,
>
> Jeff
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Rob Imig 2016-04-22 13:57:39 Re: Performant queries on table with many boolean columns
Previous Message Jeff Janes 2016-04-21 16:45:56 Re: Performant queries on table with many boolean columns