From: | Andreas Joseph Krogh <andreas(at)visena(dot)com> |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: How to optimize WHERE column_a IS NOT NULL OR column_b = 'value' |
Date: | 2014-08-11 20:19:00 |
Message-ID: | VisenaEmail.4c.3e4b549ec93661eb.147c6b40d3e@tc7-on |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
På mandag 11. august 2014 kl. 21:34:57, skrev Pavel Stehule <
pavel(dot)stehule(at)gmail(dot)com <mailto:pavel(dot)stehule(at)gmail(dot)com>>: Hi 2014-08-11
18:01 GMT+02:00 Andreas Joseph Krogh<andreas(at)visena(dot)com
<mailto:andreas(at)visena(dot)com>>: Hi folks, I have the following schema
(simplified for this example). create table folder(id integer primary key,
name varchar not null); create table document(id serial primary key, name
varchar not null, owner_id integer not null, folder_id integer references
folder(id)); create index document_owner_idx ON document(owner_id); create
index document_folder_idx ON document(folder_id); insert into folder(id,
name) values(1, 'Folder A'); insert into folder(id, name) values(2, 'Folder B');
insert into document(name, owner_id, folder_id) values('Document A', 1, 1);
insert into document(name, owner_id, folder_id) values('Document B', 1, NULL);
insert into document(name, owner_id, folder_id) values('Document C', 2, 2);
insert into document(name, owner_id, folder_id) values('Document D', 2, NULL);
select f.id <http://f.id>, f.name <http://f.name>, doc.id <http://doc.id>,
doc.owner_id,doc.name <http://doc.name> FROM document doc left outer join
folder f ON doc.folder_id =f.id <http://f.id> WHERE doc.folder_id is not null
OR doc.owner_id = 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.15..13.77 rows=4 width=76) (actual
time=0.031..0.045 rows=3 loops=1)
-> Seq Scan on document doc (cost=0.00..1.05 rows=4 width=44) (actual
time=0.012..0.018 rows=3 loops=1)
Filter: ((folder_id IS NOT NULL) OR (owner_id = 1))
Rows Removed by Filter: 1
-> Index Scan using folder_pkey on folder f (cost=0.15..3.17 rows=1
width=36) (actual time=0.005..0.006 rows=1 loops=3)
Index Cond: (doc.folder_id = id)
Planning time: 0.267 ms
Execution time: 0.094 ms
(8 rows) Is the a way to write a query which uses an index efficiently
for such a schema? I'd like to eliminate the Filter: ((folder_id IS NOT NULL)
OR (owner_id = 1)) and rather have "index cond" insted, is that possible?
your example is partially broken - ANALYZE and hashjoin and seqscan
penalization are missing - index scan is not used due too small table sizes
I tested 9.5, probably same as 9.4 and there indexes are used
postgres=# set enable_hashjoin to off;
SET
Time: 0.473 ms
postgres=# set enable_seqscan to off;
SET
Time: 0.904 ms
postgres=# explain select f.id <http://f.id>, f.name <http://f.name>, doc.id
<http://doc.id>, doc.owner_id, doc.name <http://doc.name>
FROM document doc left outer join folder f ON doc.folder_id = f.id
<http://f.id>
WHERE doc.folder_id is not null OR doc.owner_id = 1;
QUERY
PLAN
────────────────────────────────────────────────────────────────────────────────────────────────
Merge Left Join (cost=0.26..24.38 rows=3 width=32)
Merge Cond: (doc.folder_id = f.id <http://f.id>)
-> Index Scan using document_folder_idx on document doc
(cost=0.13..12.20 rows=3 width=23)
Filter: ((folder_id IS NOT NULL) OR (owner_id = 1))
-> Index Scan using folder_pkey on folder f (cost=0.13..12.16 rows=2
width=13)
Planning time: 0.663 ms
(6 rows)
default 9.2, 9.3, ...
postgres=# explain select f.id <http://f.id>, f.name <http://f.name>, doc.id
<http://doc.id>, doc.owner_id, doc.name <http://doc.name>
FROM document doc left outer join folder f ON doc.folder_id = f.id
<http://f.id>
WHERE doc.folder_id is not null OR doc.owner_id = 1;
QUERY PLAN
─────────────────────────────────────────────────────────────────────
Hash Left Join (cost=1.04..2.12 rows=3 width=32)
Hash Cond: (doc.folder_id = f.id <http://f.id>)
-> Seq Scan on document doc (cost=0.00..1.05 rows=3 width=23)
Filter: ((folder_id IS NOT NULL) OR (owner_id = 1))
-> Hash (cost=1.02..1.02 rows=2 width=13)
-> Seq Scan on folder f (cost=0.00..1.02 rows=2 width=13)
(6 rows)
and 9.2 after hashjoin and indexscan penalization
postgres=# explain select f.id <http://f.id>, f.name <http://f.name>, doc.id
<http://doc.id>, doc.owner_id, doc.name <http://doc.name>
FROM document doc left outer join folder f ON doc.folder_id = f.id
<http://f.id>
WHERE doc.folder_id is not null OR doc.owner_id = 1;
QUERY
PLAN
────────────────────────────────────────────────────────────────────────────────────────────────
Merge Left Join (cost=0.00..24.62 rows=3 width=32)
Merge Cond: (doc.folder_id = f.id <http://f.id>)
-> Index Scan using document_folder_idx on document doc
(cost=0.00..12.32 rows=3 width=23)
Filter: ((folder_id IS NOT NULL) OR (owner_id = 1))
-> Index Scan using folder_pkey on folder f (cost=0.00..12.28 rows=2
width=13)
(5 rows)
Time: 2.258 ms
What is your PostgreSQL?
Regards
Pavel
P.S. ten years ago I had a similar issue - "OR" predikates can be replaced
by UNION
you can try:
SELECT * FROM (SELECT * FROM doc
WHERE folder_id IS NOT NULL UNION SELECT * FROM doc WHERE owner_id = 1)
s LEFT JOIN folder ON s.folder_id = folder.id <http://folder.id>
or some similar magic select f.id <http://f.id>, f.name <http://f.name>,
doc.id <http://doc.id>, doc.owner_id, doc.name <http://doc.name> FROM document
doc left outer join folder f ON doc.folder_id =f.id <http://f.id> WHERE
doc.folder_id is not null OR doc.owner_id = 1; I see turning enable_seqscan
to off results in BitmapOr in 9.3: loff=# select version();
version
-------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.2 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro
4.8.1-10ubuntu9) 4.8.1, 64-bit
(1 row) loff=# show enable_seqscan ;
enable_seqscan
----------------
off
(1 row) loff=# explain analyze select f.id, f.name, doc.id, doc.owner_id,
doc.name
FROM document doc left outer join folder f ON doc.folder_id = f.id
WHERE doc.folder_id is not null OR doc.owner_id = 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=103.08..141.89 rows=1095 width=76) (actual
time=0.052..0.057 rows=3 loops=1)
Hash Cond: (doc.folder_id = f.id)
-> Bitmap Heap Scan on document doc (cost=21.10..44.85 rows=1095
width=44) (actual time=0.027..0.028 rows=3 loops=1)
Recheck Cond: ((folder_id IS NOT NULL) OR (owner_id = 1))
-> BitmapOr (cost=21.10..21.10 rows=1100 width=0) (actual
time=0.018..0.018 rows=0 loops=1)
-> Bitmap Index Scan on document_folder_idx
(cost=0.00..16.36 rows=1094 width=0) (actual time=0.012..0.012 rows=2 loops=1)
Index Cond: (folder_id IS NOT NULL)
-> Bitmap Index Scan on document_owner_idx (cost=0.00..4.20
rows=6 width=0) (actual time=0.005..0.005 rows=2 loops=1)
Index Cond: (owner_id = 1)
-> Hash (cost=66.60..66.60 rows=1230 width=36) (actual time=0.011..0.011
rows=2 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Index Scan using folder_pkey on folder f (cost=0.15..66.60
rows=1230 width=36) (actual time=0.005..0.006 rows=2 loops=1)
Total runtime: 0.125 ms
(13 rows) In 9.4-beta2 it results in an index-scan with a filter:
andreak=# select version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 9.4beta2 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu
4.8.2-19ubuntu1) 4.8.2, 64-bit
(1 row) andreak=# show enable_seqscan ;
enable_seqscan
----------------
off
(1 row) andreak=# explain analyze select f.id, f.name, doc.id, doc.owner_id,
doc.name
FROM document doc left outer join folder f ON doc.folder_id = f.id
WHERE doc.folder_id is not null OR doc.owner_id = 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.28..18.92 rows=4 width=76) (actual
time=0.032..0.044 rows=3 loops=1)
-> Index Scan using document_folder_idx on document doc (cost=0.13..6.20
rows=4 width=44) (actual time=0.018..0.024 rows=3 loops=1)
Filter: ((folder_id IS NOT NULL) OR (owner_id = 1))
Rows Removed by Filter: 1
-> Index Scan using folder_pkey on folder f (cost=0.15..3.17 rows=1
width=36) (actual time=0.003..0.004 rows=1 loops=3)
Index Cond: (doc.folder_id = id)
Planning time: 0.260 ms
Execution time: 0.094 ms
(8 rows) I have quite large dataset (and some additional joins) in my
prod-data and hoped that I could solve this with one index-scan on an index on
document-table to avoid 2 index-scans and OR-ing the results. Thanks for help!
-- Andreas Joseph Krogh CTO / Partner - Visena AS Mobile: +47 909 56 963
andreas(at)visena(dot)com <mailto:andreas(at)visena(dot)com> www.visena.com
<https://www.visena.com> <https://www.visena.com>
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2014-08-11 20:29:17 | Re: How to optimize WHERE column_a IS NOT NULL OR column_b = 'value' |
Previous Message | Andreas Joseph Krogh | 2014-08-11 20:08:01 | Re: How to optimize WHERE column_a IS NOT NULL OR column_b = 'value' |