Re: SQL WHERE: many sql or large IN()

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: tom <tom(at)tacocat(dot)net>
Cc: Postgresql <pgsql-general(at)postgresql(dot)org>
Subject: Re: SQL WHERE: many sql or large IN()
Date: 2007-04-06 13:35:54
Message-ID: Pine.LNX.4.64.0704061735050.12152@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Aha,

then why not use gin index for text[] ?

see, for example, my testing
http://www.sai.msu.su/~megera/wiki/GinTest

oleg
On Fri, 6 Apr 2007, tom wrote:

> If I read this right, intarray is for reading values from an array data type.
>
> I don't have this.
>
> I have a varchar() field that is indexed (unique), call it 'foo'
>
> I have a choice of running:
>
> SELECT bar FROM tokens WHERE foo IN ('apple','orange','biscuit'....) for up
> to ~300 words
>
> OR
>
> SELECT bar FROM tokens WHERE foo = 'apple' up to ~300 times as a
> prepared/cached SQL statements.
>
>
>
> On Apr 6, 2007, at 9:10 AM, Oleg Bartunov wrote:
>
>> Tom,
>>
>> have you seen contrib/intarray ?
>>
>> Oleg
>> On Fri, 6 Apr 2007, tom wrote:
>>
>>> I'm wondering where the differences are in running two different types of
>>> SQL statements.
>>>
>>> Given ~300 tokens/words I can either run 1 sql statement with a large list
>>> in a "WHERE foo IN (...300 tokens...)"
>>> or I can run ~300 statements, one for each token.
>>> In the first case, the SQL is not prepared, but just executed.
>>> In the second case, the SQL is prepared and run as a cached execution plan
>>> (I think).
>>>
>>> Now. It would seem that the second approach would be painfully slow. But
>>> I'm not sure that I'm seeing this.
>>> Currently I have <5 users. As always, this might change...
>>>
>>> Before I start going about coding and testing lots of stuff I thought I
>>> would ask for some historical experiences someone might have had when
>>> comparing these two approaches and if there are inflection points between
>>> the performance in terms of the number of tokens or simultaneous users.
>>>
>>> I should add that the tokens are either indexed or primary indexed but in
>>> both cases, unique, and not guaranteed to exist in every case.
>>>
>>> Initially it seems that the WHERE IN (...) approach takes a turn for the
>>> worse when the list gets very large.
>>> It also seems to do comparatively worse when the number of tokens is very
>>> small.
>>> But I can't claim any scientifically sound basis for making this
>>> distinction.
>>>
>>> Any experiences someone would like to share?
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
>>> message can get through to the mailing list cleanly
>>
>> Regards,
>> Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 6: explain analyze is your friend
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Listmail 2007-04-06 14:08:02 Re: SQL WHERE: many sql or large IN()
Previous Message tom 2007-04-06 13:23:02 Re: SQL WHERE: many sql or large IN()