Re: which is better: using OR clauses or UNION?

From: Viktor Bojović <viktor(dot)bojovic(at)gmail(dot)com>
To: adam_pgsql <adam_pgsql(at)witneyweb(dot)org>
Cc: pgsql-sql <pgsql-sql(at)postgresql(dot)org>
Subject: Re: which is better: using OR clauses or UNION?
Date: 2011-08-16 11:56:34
Message-ID: CAJu1cLacsu9KiL62=KC4dBj9dTpJwUWoaWVk9ni1qrEjq+GEww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

hi Adam,
im not sure which is faster/slower but, possibly you can speed it up by
using "in" operator

...where lower(identifier) *in *(lower('BUGS0000001884677')*
,*lower('BUGS0000001884678')
*,*....);

if you create function based index:

CREATE INDEX idx_table_lower_text ON table(*lower*(text_field));

(taken from: http://archives.postgresql.org/pgsql-sql/2003-09/msg00395.php)

On Tue, Aug 16, 2011 at 1:39 PM, adam_pgsql <adam_pgsql(at)witneyweb(dot)org>wrote:

>
> Hi,
>
> I have a query hitting a table of 25 million rows. The table has a text
> field ('identifier') which i need to query for matching rows. The question
> is if i have multiple strings to match against this field I can use multiple
> OR sub-statements or multiple statements in a UNION. The UNION seems to run
> quicker.... is this to be expected? or is there anything else I can do
> improve the speed of this query? Some query details:
>
>
> table "dba_data_base", index:
> "in_dba_data_base_identifier" btree (lower(identifier) varchar_pattern_ops)
>
>
> Query 1
> -------
> datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio,
> log_ratio, p_value
> FROM
> dba_data_base a
> WHERE
> ( lower(identifier) LIKE lower('BUGS0000001884677') OR
> lower(identifier) LIKE lower('BUGS0000001884678') OR
> lower(identifier) LIKE lower('BUGS0000001884679') OR
> lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)') )
> ORDER BY a.identifier;
>
> QUERY PLAN
>
>
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> --------------------------------
> Sort (cost=14930.13..14939.77 rows=3857 width=62) (actual
> time=3208.466..3208.652 rows=318 loops=1)
> Sort Key: identifier
> -> Bitmap Heap Scan on dba_data_base a (cost=134.43..14700.38 rows=3857
> width=62) (actual time=81.106..3207.721 rows=318 loops=1)
> Recheck Cond: ((lower(identifier) ~~ 'bugs0000001884677'::text) OR
> (lower(identifier) ~~ 'bugs0000001884678'::text) OR (lower(identifier) ~~
> 'bugs0000001884679'::text) OR (lower(identifier) ~
> ~ 'sptigr4-2210 (6f24)'::text))
> Filter: ((lower(identifier) ~~ 'bugs0000001884677'::text) OR
> (lower(identifier) ~~ 'bugs0000001884678'::text) OR (lower(identifier) ~~
> 'bugs0000001884679'::text) OR (lower(identifier) ~~ 'spt
> igr4-2210 (6f24)'::text))
> -> BitmapOr (cost=134.43..134.43 rows=3857 width=0) (actual time=
> 71.397..71.397 rows=0 loops=1)
> -> Bitmap Index Scan on in_dba_data_base_identifier
> (cost=0.00..32.64 rows=964 width=0) (actual time=0.029..0.029 rows=0
> loops=1)
> Index Cond: (lower(identifier) ~=~
> 'bugs0000001884677'::character varying)
> -> Bitmap Index Scan on in_dba_data_base_identifier
> (cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0
> loops=1)
> Index Cond: (lower(identifier) ~=~
> 'bugs0000001884678'::character varying)
> -> Bitmap Index Scan on in_dba_data_base_identifier
> (cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0
> loops=1)
> Index Cond: (lower(identifier) ~=~
> 'bugs0000001884679'::character varying)
> -> Bitmap Index Scan on in_dba_data_base_identifier
> (cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347 rows=318
> loops=1)
> Index Cond: (lower(identifier) ~=~ 'sptigr4-2210
> (6f24)'::character varying)
> Total runtime: 3208.904 ms
>
>
> Query 2
> -------
> datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio,
> log_ratio, p_value
> FROM
> dba_data_base a
> WHERE
> lower(identifier) LIKE lower('BUGS0000001884677')
> UNION
> SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
> FROM
> dba_data_base a
> WHERE
> lower(identifier) LIKE lower('BUGS0000001884678')
> UNION
> SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
> FROM
> dba_data_base a
> WHERE
> lower(identifier) LIKE lower('BUGS0000001884679')
> UNION
> SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
> FROM
> dba_data_base a
> WHERE
> lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)')
> ORDER BY identifier;
>
> Sort (cost=15702.26..15711.90 rows=3856 width=62) (actual
> time=3.688..3.886 rows=317 loops=1)
> Sort Key: identifier
> -> Unique (cost=15414.74..15472.58 rows=3856 width=62) (actual
> time=2.663..3.387 rows=317 loops=1)
> -> Sort (cost=15414.74..15424.38 rows=3856 width=62) (actual
> time=2.660..2.834 rows=318 loops=1)
> Sort Key: bioassay_id, identifier, ratio, log_ratio, p_value
> -> Append (cost=32.88..15185.06 rows=3856 width=62) (actual
> time=0.320..2.131 rows=318 loops=1)
> -> Bitmap Heap Scan on dba_data_base a
> (cost=32.88..3786.62 rows=964 width=62) (actual time=0.041..0.041 rows=0
> loops=1)
> Filter: (lower(identifier) ~~
> 'bugs0000001884677'::text)
> -> Bitmap Index Scan on
> in_dba_data_base_identifier (cost=0.00..32.64 rows=964 width=0) (actual
> time=0.036..0.036 rows=0 loops=1)
> Index Cond: (lower(identifier) ~=~
> 'bugs0000001884677'::character varying)
> -> Bitmap Heap Scan on dba_data_base a
> (cost=32.88..3786.62 rows=964 width=62) (actual time=0.010..0.010 rows=0
> loops=1)
> Filter: (lower(identifier) ~~
> 'bugs0000001884678'::text)
> -> Bitmap Index Scan on
> in_dba_data_base_identifier (cost=0.00..32.64 rows=964 width=0) (actual
> time=0.008..0.008 rows=0 loops=1)
> Index Cond: (lower(identifier) ~=~
> 'bugs0000001884678'::character varying)
> -> Bitmap Heap Scan on dba_data_base a
> (cost=32.88..3786.62 rows=964 width=62) (actual time=0.010..0.010 rows=0
> loops=1)
> Filter: (lower(identifier) ~~
> 'bugs0000001884679'::text)
> -> Bitmap Index Scan on
> in_dba_data_base_identifier (cost=0.00..32.64 rows=964 width=0) (actual
> time=0.008..0.008 rows=0 loops=1)
> Index Cond: (lower(identifier) ~=~
> 'bugs0000001884679'::character varying)
> -> Bitmap Heap Scan on dba_data_base a
> (cost=32.88..3786.62 rows=964 width=62) (actual time=0.255..1.676 rows=318
> loops=1)
> Filter: (lower(identifier) ~~ 'sptigr4-2210
> (6f24)'::text)
> -> Bitmap Index Scan on
> in_dba_data_base_identifier (cost=0.00..32.64 rows=964 width=0) (actual
> time=0.178..0.178 rows=318 loops=1)
> Index Cond: (lower(identifier) ~=~
> 'sptigr4-2210 (6f24)'::character varying)
> Total runtime: 4.174 ms
>
> Also which should scale better if I add more strings to match? would there
> be any better design patterns for this problem?
>
> Thanks for any help
>
> Adam
>
> select version();
> version
> ----------------------------------------------------------------
> PostgreSQL 8.2.12 on i686-pc-linux-gnu, compiled by GCC 2.95.4
>
>
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>

--
---------------------------------------
Viktor Bojović
---------------------------------------
Wherever I go, Murphy goes with me

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Adrian Klaver 2011-08-16 13:43:54 Re: bigint and unix time
Previous Message adam_pgsql 2011-08-16 11:39:09 which is better: using OR clauses or UNION?