Quick Links

Re: How to optimize WHERE column_a IS NOT NULL OR column_b = 'value'

From:	SENADIN <senadin2006(at)yahoo(dot)fr>
To:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Andreas Joseph Krogh <andreas(at)visena(dot)com>
Cc:	postgres list <pgsql-sql(at)postgresql(dot)org>
Subject:	Re: How to optimize WHERE column_a IS NOT NULL OR column_b = 'value'
Date:	2014-08-11 19:49:32
Message-ID:	1407786572.35943.YahooMailNeo@web172205.mail.ir2.yahoo.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

Hello

I am looking for a funtion to calculate the size of a record in a table in a database. Thank you
Anned

Le Lundi 11 août 2014 14h37, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> a écrit :

2014-08-11 18:01 GMT+02:00 Andreas Joseph Krogh <andreas(at)visena(dot)com>:

Hi folks,
>
>I have the following schema (simplified for this example).
>
>create table folder(id integer primary key, name varchar not null);
>
>create table document(id serial primary key, name varchar not null, owner_id integer not null, folder_id integer references folder(id));
>create index document_owner_idx ON document(owner_id);
>create index document_folder_idx ON document(folder_id);
>
>insert into folder(id, name) values(1, 'Folder A');
>insert into folder(id, name) values(2, 'Folder B');
>insert into document(name, owner_id, folder_id) values('Document A', 1, 1);
>insert into document(name, owner_id, folder_id) values('Document B', 1, NULL);
>insert into document(name, owner_id, folder_id) values('Document C', 2, 2);
>insert into document(name, owner_id, folder_id) values('Document D', 2, NULL);
>
>select f.id, f.name, doc.id, doc.owner_id, doc.name
>FROM document doc left outer join folder f ON doc.folder_id = f.id
>WHERE doc.folder_id is not null OR doc.owner_id = 1;
>
>                                                         QUERY PLAN
>-----------------------------------------------------------------------------------------------------------------------------
> Nested Loop Left Join (cost=0.15..13.77 rows=4 width=76) (actual time=0.031..0.045 rows=3 loops=1)
>   -> Seq Scan on document doc (cost=0.00..1.05 rows=4 width=44) (actual time=0.012..0.018 rows=3 loops=1)
>         Filter: ((folder_id IS NOT NULL) OR (owner_id = 1))
>         Rows Removed by Filter: 1
>   -> Index Scan using folder_pkey on folder f (cost=0.15..3.17 rows=1 width=36) (actual time=0.005..0.006 rows=1 loops=3)
>         Index Cond: (doc.folder_id = id)
> Planning time: 0.267 ms
> Execution time: 0.094 ms
>(8 rows)
>
>
>Is the a way to write a query which uses an index efficiently for such a schema?
>
>I'd like to eliminate the Filter: ((folder_id IS NOT NULL) OR (owner_id = 1)) and rather have "index cond" insted, is that possible?

your example is partially broken - ANALYZE and hashjoin and seqscan penalization are missing - index scan is not used due too small table sizes

I tested 9.5, probably same as 9.4 and there indexes are used

postgres=# set enable_hashjoin to off;
SET
Time: 0.473 ms
postgres=# set enable_seqscan to off;
SET
Time: 0.904 ms
postgres=# explain select f.id, f.name, doc.id, doc.owner_id, doc.name
FROM document doc left outer join folder f ON doc.folder_id = f.id
WHERE doc.folder_id is not null OR doc.owner_id = 1;
                                           QUERY PLAN
────────────────────────────────────────────────────────────────────────────────────────────────
Merge Left Join (cost=0.26..24.38 rows=3 width=32)
   Merge Cond: (doc.folder_id = f.id)
   -> Index Scan using document_folder_idx on document doc (cost=0.13..12.20 rows=3 width=23)
         Filter: ((folder_id IS NOT NULL) OR (owner_id = 1))
   -> Index Scan using folder_pkey on folder f (cost=0.13..12.16 rows=2 width=13)
Planning time: 0.663 ms
(6 rows)

default 9.2, 9.3, ...

postgres=# explain select f.id, f.name, doc.id, doc.owner_id, doc.name
FROM document doc left outer join folder f ON doc.folder_id = f.id
WHERE doc.folder_id is not null OR doc.owner_id = 1;
                             QUERY PLAN
─────────────────────────────────────────────────────────────────────
Hash Left Join (cost=1.04..2.12 rows=3 width=32)
   Hash Cond: (doc.folder_id = f.id)
   -> Seq Scan on document doc (cost=0.00..1.05 rows=3 width=23)
         Filter: ((folder_id IS NOT NULL) OR (owner_id = 1))
   -> Hash (cost=1.02..1.02 rows=2 width=13)
         -> Seq Scan on folder f (cost=0.00..1.02 rows=2 width=13)
(6 rows)

and 9.2 after hashjoin and indexscan penalization

postgres=# explain select f.id, f.name, doc.id, doc.owner_id, doc.name
FROM document doc left outer join folder f ON doc.folder_id = f.id
WHERE doc.folder_id is not null OR doc.owner_id = 1;
                                           QUERY PLAN
────────────────────────────────────────────────────────────────────────────────────────────────
Merge Left Join (cost=0.00..24.62 rows=3 width=32)
   Merge Cond: (doc.folder_id = f.id)
   -> Index Scan using document_folder_idx on document doc (cost=0.00..12.32 rows=3 width=23)
         Filter: ((folder_id IS NOT NULL) OR (owner_id = 1))
   -> Index Scan using folder_pkey on folder f (cost=0.00..12.28 rows=2 width=13)
(5 rows)

Time: 2.258 ms

What is your PostgreSQL?

Regards

Pavel

P.S. ten years ago I had a similar issue - "OR" predikates can be replaced by UNION

you can try:

SELECT * FROM

(SELECT * FROM doc
WHERE folder_id IS NOT NULL

UNION

SELECT * FROM doc

WHERE owner_id = 1) s

LEFT JOIN folder ON s.folder_id = folder.id

or some similar magic

select f.id, f.name, doc.id, doc.owner_id, doc.name
FROM document doc left outer join folder f ON doc.folder_id = f.id
WHERE doc.folder_id is not null OR doc.owner_id = 1;

>--
>Andreas Joseph Krogh
>CTO / Partner - Visena AS
>Mobile: +47 909 56 963
>andreas(at)visena(dot)com
>www.visena.com

In response to

Re: How to optimize WHERE column_a IS NOT NULL OR column_b = 'value' at 2014-08-11 19:34:57 from Pavel Stehule

Responses

Re: How to optimize WHERE column_a IS NOT NULL OR column_b = 'value' at 2014-08-11 20:08:01 from Andreas Joseph Krogh

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Andreas Joseph Krogh	2014-08-11 20:08:01	Re: How to optimize WHERE column_a IS NOT NULL OR column_b = 'value'
Previous Message	Pavel Stehule	2014-08-11 19:34:57	Re: How to optimize WHERE column_a IS NOT NULL OR column_b = 'value'