Row value expression much faster than equivalent OR clauses

From: Steven Grimm <sgrimm(at)thesegovia(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Row value expression much faster than equivalent OR clauses
Date: 2016-12-28 16:58:24
Message-ID: 5863EF30.7050703@thesegovia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

A library my application is using does a "scan a batch at a time" loop
over a table of events, keeping track of its last position so it can
start the next query in the right place.

SELECT eventIdentifier, aggregateIdentifier, sequenceNumber, timeStamp,
payloadType,
payloadRevision, payload, metaData
FROM DomainEventEntry e
WHERE ((e.timeStamp > '2016-12-19T20:34:22.315Z')
OR (e.timeStamp = '2016-12-19T20:34:22.315Z'
AND e.sequenceNumber > 0)
OR (e.timeStamp = '2016-12-19T20:34:22.315Z'
AND e.sequenceNumber = 0
AND e.aggregateIdentifier >
'dev:642e1953-2562-4768-80d9-0c3af9b0ff84'))
AND (type = 'transAggPrototype')
ORDER BY e.timeStamp ASC, e.sequenceNumber ASC, e.aggregateIdentifier ASC
LIMIT 100;

This uses the index on the three columns it's using for ordering of
events, but (if I'm reading the explain output correctly) does a full
scan of the index.

Limit (cost=0.55..1349.44 rows=100 width=576) (actual
time=526.814..527.238 rows=100 loops=1)
-> Index Scan using domainevententry_ts_seq_agg on domainevententry
e (cost=0.55..92494.44 rows=6857 width=576) (actual
time=526.811..527.035 rows=100 loops=1)
Filter: (((type)::text = 'transAggPrototype'::text) AND
((("timestamp")::text > '2016-12-19T20:34:22.315Z'::text) OR
((("timestamp")::text = '2016-12-19T20:34:22.315Z'::text) AND
(sequencenumber > 0)) OR ((("timestamp")::text =
'2016-12-19T20:34:22.315Z'::text) AND (sequencenumber = 0) AND
((aggregateidentifier)::text >
'dev:642e1953-2562-4768-80d9-0c3af9b0ff84'::text))))
Rows Removed by Filter: 332183
Planning time: 1.893 ms
Execution time: 527.368 ms

I played around with it a little and one thing I tried was to
restructure the WHERE clause using a row value expression that's
semantically equivalent to the original.

SELECT eventIdentifier, aggregateIdentifier, sequenceNumber, timeStamp,
payloadType,
payloadRevision, payload, metaData
FROM DomainEventEntry e
WHERE (e.timeStamp, e.sequenceNumber, e.aggregateIdentifier) >
('2016-11-19T20:34:22.315Z', 0,
'dev:642e1953-2562-4768-80d9-0c3af9b0ff84')
AND (type = 'transAggPrototype')
ORDER BY e.timeStamp ASC, e.sequenceNumber ASC, e.aggregateIdentifier ASC
LIMIT 100;

This ends up being a LOT faster:

Limit (cost=0.55..56.81 rows=100 width=576) (actual time=0.065..0.667
rows=100 loops=1)
-> Index Scan using domainevententry_ts_seq_agg on domainevententry
e (cost=0.55..65581.93 rows=116573 width=576) (actual time=0.062..0.437
rows=100 loops=1)
Index Cond: (ROW(("timestamp")::text, sequencenumber,
(aggregateidentifier)::text) > ROW('2016-11-19T20:34:22.315Z'::text, 0,
'dev:642e1953-2562-4768-80d9-0c3af9b0ff84'::text))
Filter: ((type)::text = 'transAggPrototype'::text)
Rows Removed by Filter: 235
Planning time: 1.705 ms
Execution time: 0.795 ms

I wonder if the query planner could recognize that the two queries are
equivalent and choose the second plan for the OR-clause version, or at
least use the index more efficiently. This is on PostgreSQL 9.5.2.

-Steve

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Melvin Davidson 2016-12-28 18:13:06 Re: Securing Information
Previous Message Jan de Visser 2016-12-28 16:53:51 Re: Performance PLV8 vs PLPGSQL