OR vs UNION vs UNION ALL

From: Etienne Champetier <champetier(dot)etienne(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: OR vs UNION vs UNION ALL
Date: 2015-09-23 13:38:45
Message-ID: CAOdf3gqf9Hhxk3g6qqkV6SkQEpM2JiARY45W0ATQ=JdqvCntLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I've reworked a bit an application to make 1 query (with many OR) instead
of thousands (2900 in this bench),
but was a bit disappointed with the perf. Here follow a quick bench and 2
questions at the end:

Here is my table
CREATE TABLE properties
(
item_id text NOT NULL,
calendar_id uuid NOT NULL,
clef text,
valeur text,
recurrence_id bigint,
CONSTRAINT fk_props_id FOREIGN KEY (item_id, calendar_id, recurrence_id)
REFERENCES cal_item (itemid, calendar_id, recurrence_id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE
)

Here are 3 versions of the "same" query, there is ~2900 conditions (OR,
UNION, UNION ALL), returning ~9000 lines (same exact number each time)
(i've ran the tests multiple time, using postgresql 9.3.5 on rhel 6, on an
idle physical server)

1)
SELECT clef,valeur,ITEM_ID,CALENDAR_ID,RECURRENCE_ID FROM PROPERTIES WHERE
(ITEM_ID=((E'1263d4ac-d8a3-44d7-b971-5eee947ff36a')::text) AND
CALENDAR_ID=((E'f9532857-ca68-497b-b2a9-162507110e55')::uuid) AND
RECURRENCE_ID=((0)::int8)) OR
(ITEM_ID=((E'9a1e40bb-0280-4e79-ae13-ee57d614fdef')::text) AND
CALENDAR_ID=((E'f9532857-ca68-497b-b2a9-162507110e55')::uuid) AND
RECURRENCE_ID=((0)::int8)) OR
.....

total time: 7.6s
plan:
Index Cond,
Bitmap index scan on fk_props_id,
BitmapOr,
Recheck Cond,
Bitmap Heap Scan on properties

2)
SELECT clef,valeur,ITEM_ID,CALENDAR_ID,RECURRENCE_ID FROM PROPERTIES WHERE
(ITEM_ID=((E'1263d4ac-d8a3-44d7-b971-5eee947ff36a')::text) AND
CALENDAR_ID=((E'f9532857-ca68-497b-b2a9-162507110e55')::uuid) AND
RECURRENCE_ID=((0)::int8)) UNION
SELECT clef,valeur,ITEM_ID,CALENDAR_ID,RECURRENCE_ID FROM PROPERTIES WHERE
(ITEM_ID=((E'9a1e40bb-0280-4e79-ae13-ee57d614fdef')::text) AND
CALENDAR_ID=((E'f9532857-ca68-497b-b2a9-162507110e55')::uuid) AND
RECURRENCE_ID=((0)::int8)) UNION
...

total time: 1s
plan:
Index Cond,
Index Scan using fki_props_id on properties properties_XXX,
Append,
HashAggregate

3)
SELECT clef,valeur,ITEM_ID,CALENDAR_ID,RECURRENCE_ID FROM PROPERTIES WHERE
(ITEM_ID=((E'1263d4ac-d8a3-44d7-b971-5eee947ff36a')::text) AND
CALENDAR_ID=((E'f9532857-ca68-497b-b2a9-162507110e55')::uuid) AND
RECURRENCE_ID=((0)::int8)) UNION ALL
SELECT clef,valeur,ITEM_ID,CALENDAR_ID,RECURRENCE_ID FROM PROPERTIES WHERE
(ITEM_ID=((E'9a1e40bb-0280-4e79-ae13-ee57d614fdef')::text) AND
CALENDAR_ID=((E'f9532857-ca68-497b-b2a9-162507110e55')::uuid) AND
RECURRENCE_ID=((0)::int8)) UNION ALL
...

total time: 1.2s
plan:
Index Cond,
Index Scan using fki_props_id on properties properties_XXX,
Append

Two questions:
1) Is it normal to have such a big difference between OR and UNION and
should i always prefer UNION when possible?

2) How can UNION ALL be slower than UNION, it's not doing the HashAggregate
but is 0.2s slower?

Thanks in advance
Etienne

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jeff Dik 2015-09-23 14:29:24 Re: The curious case of two inserts, a shrinking xmax, and a ShareLock on transaction
Previous Message Daniel Begin 2015-09-23 11:37:22 Re: Advise on memory usage limitation by PostgreSQL on Windows