Re: How to optimize WHERE column_a IS NOT NULL OR column_b = 'value'

From: Andreas Joseph Krogh <andreas(at)visena(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: How to optimize WHERE column_a IS NOT NULL OR column_b = 'value'
Date: 2014-08-11 20:08:01
Message-ID: VisenaEmail.4b.ce3097d925043f69.147c6af1bcb@tc7-on
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Start a new thread   På mandag 11. august 2014 kl. 21:49:32, skrev SENADIN <
senadin2006(at)yahoo(dot)fr <mailto:senadin2006(at)yahoo(dot)fr>>: Hello I am looking for a
funtion to calculate the size of a record in a table in a database. Thank you
Anned
  Le Lundi 11 août 2014 14h37, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> a
écrit :   Hi   2014-08-11 18:01 GMT+02:00 Andreas Joseph Krogh <
andreas(at)visena(dot)com <mailto:andreas(at)visena(dot)com>>: Hi folks,   I have the
following schema (simplified for this example).   create table folder(id
integer primary key, name varchar not null);   create table document(id serial
primary key, name varchar not null, owner_id integer not null, folder_id
integer references folder(id)); create index document_owner_idx ON
document(owner_id); create index document_folder_idx ON document(folder_id);  
insert into folder(id, name) values(1, 'Folder A'); insert into folder(id,
name) values(2, 'Folder B'); insert into document(name, owner_id, folder_id)
values('Document A',  1, 1); insert into document(name, owner_id, folder_id)
values('Document B',  1, NULL); insert into document(name, owner_id, folder_id)
values('Document C',  2, 2); insert into document(name, owner_id, folder_id)
values('Document D',  2, NULL);   select f.id <http://f.id/>, f.name
<http://f.name/>, doc.id <http://doc.id/>, doc.owner_id, doc.name
<http://doc.name/> FROM document doc left outer join folder f ON doc.folder_id =
f.id <http://f.id/> WHERE doc.folder_id is not null OR doc.owner_id = 1;  
                                                         QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.15..13.77 rows=4 width=76) (actual
time=0.031..0.045 rows=3 loops=1)
   ->  Seq Scan on document doc  (cost=0.00..1.05 rows=4 width=44) (actual
time=0.012..0.018 rows=3 loops=1)
         Filter: ((folder_id IS NOT NULL) OR (owner_id = 1))
         Rows Removed by Filter: 1
   ->  Index Scan using folder_pkey on folder f  (cost=0.15..3.17 rows=1
width=36) (actual time=0.005..0.006 rows=1 loops=3)
         Index Cond: (doc.folder_id = id)
 Planning time: 0.267 ms
 Execution time: 0.094 ms
(8 rows)     Is the a way to write a query which uses an index efficiently
for such a schema?   I'd like to eliminate the Filter: ((folder_id IS NOT NULL)
OR (owner_id = 1)) and rather have "index cond" insted, is that possible?  
your example is partially broken - ANALYZE and hashjoin and seqscan
penalization are missing - index scan is not used due too small table sizes
  I tested 9.5, probably same as 9.4 and there indexes are used

postgres=# set enable_hashjoin to off;
SET
Time: 0.473 ms
postgres=# set enable_seqscan to off;
SET
Time: 0.904 ms
postgres=# explain select f.id <http://f.id/>, f.name <http://f.name/>, doc.id
<http://doc.id/>, doc.owner_id, doc.name <http://doc.name/>
FROM document doc left outer join folder f ON doc.folder_id = f.id
<http://f.id/>
WHERE doc.folder_id is not null OR doc.owner_id = 1;
                                           QUERY
PLAN                                          

────────────────────────────────────────────────────────────────────────────────────────────────
 Merge Left Join  (cost=0.26..24.38 rows=3 width=32)
   Merge Cond: (doc.folder_id = f.id <http://f.id/>)
   ->  Index Scan using document_folder_idx on document doc 
(cost=0.13..12.20 rows=3 width=23)
         Filter: ((folder_id IS NOT NULL) OR (owner_id = 1))
   ->  Index Scan using folder_pkey on folder f  (cost=0.13..12.16 rows=2
width=13)
 Planning time: 0.663 ms
(6 rows)
  default 9.2, 9.3, ...

postgres=# explain select f.id <http://f.id/>, f.name <http://f.name/>, doc.id
<http://doc.id/>, doc.owner_id, doc.name <http://doc.name/>
FROM document doc left outer join folder f ON doc.folder_id = f.id
<http://f.id/>
WHERE doc.folder_id is not null OR doc.owner_id = 1;
                             QUERY PLAN                             
─────────────────────────────────────────────────────────────────────
 Hash Left Join  (cost=1.04..2.12 rows=3 width=32)
   Hash Cond: (doc.folder_id = f.id <http://f.id/>)
   ->  Seq Scan on document doc  (cost=0.00..1.05 rows=3 width=23)
         Filter: ((folder_id IS NOT NULL) OR (owner_id = 1))
   ->  Hash  (cost=1.02..1.02 rows=2 width=13)
         ->  Seq Scan on folder f  (cost=0.00..1.02 rows=2 width=13)
(6 rows)
  and 9.2 after hashjoin and indexscan penalization

postgres=# explain select f.id <http://f.id/>, f.name <http://f.name/>, doc.id
<http://doc.id/>, doc.owner_id, doc.name <http://doc.name/>
FROM document doc left outer join folder f ON doc.folder_id = f.id
<http://f.id/>
WHERE doc.folder_id is not null OR doc.owner_id = 1;
                                           QUERY
PLAN                                          

────────────────────────────────────────────────────────────────────────────────────────────────
 Merge Left Join  (cost=0.00..24.62 rows=3 width=32)
   Merge Cond: (doc.folder_id = f.id <http://f.id/>)
   ->  Index Scan using document_folder_idx on document doc 
(cost=0.00..12.32 rows=3 width=23)
         Filter: ((folder_id IS NOT NULL) OR (owner_id = 1))
   ->  Index Scan using folder_pkey on folder f  (cost=0.00..12.28 rows=2
width=13)
(5 rows)

Time: 2.258 ms
  What is your PostgreSQL?
  Regards
  Pavel
  P.S. ten years ago I had a similar issue - "OR" predikates can be replaced
by UNION
  you can try:
  SELECT * FROM (SELECT * FROM doc
  WHERE folder_id IS NOT NULL UNION   SELECT * FROM doc   WHERE owner_id = 1)
s   LEFT JOIN folder ON s.folder_id = folder.id <http://folder.id/>
  or some similar magic     select f.id <http://f.id/>, f.name
<http://f.name/>, doc.id <http://doc.id/>, doc.owner_id, doc.name
<http://doc.name/> FROM document doc left outer join folder f ON doc.folder_id =
f.id <http://f.id/> WHERE doc.folder_id is not null OR doc.owner_id = 1;      
--Andreas Joseph Krogh CTO / Partner - Visena AS Mobile: +47 909 56 963 <>
andreas(at)visena(dot)com <mailto:andreas(at)visena(dot)com> www.visena.com
<https://www.visena.com/> <https://www.visena.com/>    
        -- Andreas Joseph Krogh CTO / Partner - Visena AS Mobile: +47 909 56
963 andreas(at)visena(dot)com <mailto:andreas(at)visena(dot)com> www.visena.com
<https://www.visena.com> <https://www.visena.com>  

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Andreas Joseph Krogh 2014-08-11 20:19:00 Re: How to optimize WHERE column_a IS NOT NULL OR column_b = 'value'
Previous Message SENADIN 2014-08-11 19:49:32 Re: How to optimize WHERE column_a IS NOT NULL OR column_b = 'value'