Optimise OR condiditions across multiple joins

From: Andrew Beverley <andy(at)andybev(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Optimise OR condiditions across multiple joins
Date: 2016-06-09 06:10:34
Message-ID: 20160609071034.93b9227e59ced514cb9669d8@andybev.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Dear all,

Is there a way to efficiently perform OR conditions across multiple
joins?

For example, I have the following statement:

SELECT RECORD.id
FROM RECORD
left join string
ON string.record_id = RECORD.id
AND string.layout_id = 6
left join DATE
ON DATE.record_id = RECORD.id
AND DATE.layout_id = 8
WHERE ( string.value_index = 'beverley'
OR DATE.value = '1980-11-16' );

Which when analysed produces the output at the end of this email. It
appears to be doing a slow filter across all values, once it's
completed the joins across all rows.

If I change the OR condition to an AND condition, then indexes are used
to good effect and the query is very fast.

Does anyone have any advice as to how I can efficiently use indexes for
the OR condition and substantially improve the query time? I can
provide full details of all my current indexes if that helps.

PostgreSQL 9.4.8 on Debian 8.5

Thanks,

Andy

Hash Right Join (cost=176359.93..266784.76 rows=135 width=8) (actual time=17189.826..17845.939 rows=15 loops=1)
Hash Cond: (string.record_id = record.id)
Filter: (((string.value_index)::text = 'beverley'::text) OR (date.value = '1980-11-16'::date))
Rows Removed by Filter: 2094748
-> Bitmap Heap Scan on string (cost=6683.45..76441.68 rows=305938 width=19) (actual time=45.417..681.269 rows=285099 loops=1)
Recheck Cond: (layout_id = 6)
Heap Blocks: exact=19548
-> Bitmap Index Scan on string_idx_layout_id (cost=0.00..6606.97 rows=305938 width=0) (actual time=39.527..39.527 rows=285103 loops=1)
Index Cond: (layout_id = 6)
-> Hash (cost=133376.36..133376.36 rows=2088250 width=12) (actual time=15917.540..15917.540 rows=2094763 loops=1)
Buckets: 16384 Batches: 32 Memory Usage: 2611kB
-> Hash Right Join (cost=71333.62..133376.36 rows=2088250 width=12) (actual time=7041.758..12268.693 rows=2094763 loops=1)
Hash Cond: (date.record_id = record.id)
-> Seq Scan on date (cost=0.00..46095.40 rows=285676 width=12) (actual time=0.871..769.841 rows=285099 loops=1)
Filter: (layout_id = 8)
Rows Removed by Filter: 855297
-> Hash (cost=37072.50..37072.50 rows=2088250 width=8) (actual time=7032.415..7032.415 rows=2094763 loops=1)
Buckets: 16384 Batches: 32 Memory Usage: 2576kB
-> Seq Scan on record (cost=0.00..37072.50 rows=2088250 width=8) (actual time=0.037..3398.638 rows=2094763 loops=1)

Browse pgsql-general by date

  From Date Subject
Next Message Greg Navis 2016-06-09 08:57:11 Re: [pg_trgm] Making similarity(?, ?) < ? use an index
Previous Message John R Pierce 2016-06-09 04:50:23 Re: Automate copy - Postgres 9.2