Re: which is better: using OR clauses or UNION?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: adam_pgsql <adam_pgsql(at)witneyweb(dot)org>
Cc: pgsql-sql <pgsql-sql(at)postgresql(dot)org>
Subject: Re: which is better: using OR clauses or UNION?
Date: 2011-08-16 14:09:08
Message-ID: 26139.1313503748@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

adam_pgsql <adam_pgsql(at)witneyweb(dot)org> writes:
> I have a query hitting a table of 25 million rows. The table has a
> text field ('identifier') which i need to query for matching rows. The
> question is if i have multiple strings to match against this field I
> can use multiple OR sub-statements or multiple statements in a
> UNION. The UNION seems to run quicker.... is this to be expected?

Your test cases don't seem exactly comparable; in particular I think the
second one is benefiting from the first one having already read and
cached the relevant disk blocks. Notice how you've got, eg,

> -> Bitmap Index Scan on in_dba_data_base_identifier (cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347 rows=318 loops=1)
> Index Cond: (lower(identifier) ~=~ 'sptigr4-2210 (6f24)'::character varying)

versus

> -> Bitmap Index Scan on in_dba_data_base_identifier (cost=0.00..32.64 rows=964 width=0) (actual time=0.178..0.178 rows=318 loops=1)
> Index Cond: (lower(identifier) ~=~ 'sptigr4-2210 (6f24)'::character varying)

Those are the exact same subplan, so any honest comparison should be
finding them to take the same amount of time. When the actual readings
are different by a factor of several hundred, there's something wrong
with your measurement process.

In the end this comes down to whether duplicates will be eliminated more
efficiently by a BitmapOr step or by sort/uniq on the resulting rows.
I'd have to bet on the BitmapOr myself, but it's likely that this is
down in the noise compared to the actual disk accesses in any
not-fully-cached scenario. Also, if you don't expect the sub-statements
to yield any duplicates, or don't care about seeing the same row twice
in the output, you should consider UNION ALL instead of UNION.

regards, tom lane

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Jasen Betts 2011-08-16 14:27:50 Re: which is better: using OR clauses or UNION?
Previous Message David Johnston 2011-08-16 14:06:56 Re: which is better: using OR clauses or UNION?