Re: Comparing user attributes with bitwise operators

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Patrick Clery <patrick(at)phpforhire(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Comparing user attributes with bitwise operators
Date: 2004-09-19 05:07:56
Message-ID: 87sm9eluc3.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Patrick Clery <patrick(at)phpforhire(dot)com> writes:

> PLAN
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=6.03..6.03 rows=1 width=68) (actual time=69.391..69.504 rows=10 loops=1)
> -> Sort (cost=6.03..6.03 rows=1 width=68) (actual time=69.381..69.418 rows=10 loops=1)
> Sort Key: age
> -> Index Scan using people_attributes_search on people_attributes (cost=0.00..6.02 rows=1 width=68) (actual time=0.068..61.648 rows=937 loops=1)
> Index Cond: (('{30,31,32,33,34,35,36,37,38,39,40}'::integer[] && age) AND ('{2}'::integer[] && gender) AND ('{1,2,4}'::integer[] && orientation))
> Total runtime: 69.899 ms
> (6 rows)
...
> - Is there a way of speeding up the sort?

The sort seems to have only taken 8ms out of 69ms or just over 10%. As long as
the index scan doesn't match too many records the sort should never be any
slower so it shouldn't be the performance bottleneck. You might consider
putting a subquery inside the order by with a limit to ensure that the sort
never gets more than some safe maximum. Something like:

select * from (select * from people_attributes where ... limit 1000) order by age limit 10

This means if the query matches more than 1000 it won't be sorted properly by
age; you'll get the top 10 out of some random subset. But you're protected
against ever having to sort more than 1000 records.

> - Will using queries like " WHERE orientation IN (1,2,4) " be any better/worse?

Well they won't use the GiST index, so no. If there was a single column with a
btree index then this would be the cleanest way to go.

> - The queries with the GiST index are faster, but is it of any benefit when
> the int[] arrays all contain a single value?

Well you've gone from 5 minutes to 60ms. You'll have to do more than one test
to be sure but it sure seems like it's of some benefit.

If they're always a single value you could make it an expression index instead
and not have to change your data model.

Just have the fields be integers individually and make an index as:

create index idx on people_attributes using gist (
(array[age]) gist__int_ops,
(array[gender]) gist__int_ops,
...
)

However I would go one step further. I would make the index simply:

create index idx on people_attributes using gist (
(array[age,gender,orientation,...]) gist__int_ops
)

And ensure that all of these attributes have distinct domains. Ie, that they
don't share any values. There are 4 billion integer values available so that
shouldn't be an issue.

Then you could use query_int to compare them the way you want. You
misunderstood how query_int is supposed to work. You index an array column and
then you can check it against a query_int just as you're currently checking
for overlap. Think of @@ as a more powerful version of the overlap operator
that can do complex logical expressions.

The equivalent of

where '{30,31,32,33,34,35,36,37,38,39,40}'::int[] && age
and '{2}'::int[] && gender
and '{1,2,4}'::int[] && orientation

would then become:

WHERE array[age,gender,orientation] @@ '(30|31|32|33|34|35|36|37|38|39|40)&(2)&(1|2|4)'

except you would have to change orientation and gender to not both have a
value of 2.

You might consider doing the expression index a bit of overkill actually. You
might consider just storing a column "attributes" with an integer array
directly in the table.

You would also want a table that lists the valid attributes to be sure not to
have any overlaps:

1 age 1
2 age 2
...
101 gender male
102 gender female
103 orientation straight
104 orientation gay
105 orientation bi
106 bodytype scrawny
...

> - Is there any hope for this structure?

You'll have to test this carefully. I tried using GiST indexes for my project
and found that I couldn't load the data and build the GiST indexes fast
enough. You have to test the costs of building and maintaining this index,
especially since it has so many columns in it.

But it looks like your queries are in trouble without it so hopefully it'll be
ok on the insert/update side for you.

--
greg

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message markir 2004-09-19 10:04:41 Re: Tryint to match Solaris-Oracle performance with
Previous Message Patrick Clery 2004-09-19 03:26:13 Re: Comparing user attributes with bitwise operators