From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Andreas Joseph Krogh <andreas(at)visena(dot)com> |
Cc: | postgres list <pgsql-sql(at)postgresql(dot)org> |
Subject: | Re: How to optimize WHERE column_a IS NOT NULL OR column_b = 'value' |
Date: | 2014-08-11 19:34:57 |
Message-ID: | CAFj8pRAZhCZ7P+B1Stamc5Gy78yFz3CdbTNrZPy7k7W30Ok45w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Hi
2014-08-11 18:01 GMT+02:00 Andreas Joseph Krogh <andreas(at)visena(dot)com>:
> Hi folks,
>
> I have the following schema (simplified for this example).
>
> create table folder(id integer primary key, name varchar not null);
>
> create table document(id serial primary key, name varchar not null,
> owner_id integer not null, folder_id integer references folder(id));
> create index document_owner_idx ON document(owner_id);
> create index document_folder_idx ON document(folder_id);
>
> insert into folder(id, name) values(1, 'Folder A');
> insert into folder(id, name) values(2, 'Folder B');
> insert into document(name, owner_id, folder_id) values('Document A', 1,
> 1);
> insert into document(name, owner_id, folder_id) values('Document B', 1,
> NULL);
> insert into document(name, owner_id, folder_id) values('Document C', 2,
> 2);
> insert into document(name, owner_id, folder_id) values('Document D', 2,
> NULL);
>
> select f.id, f.name, doc.id, doc.owner_id, doc.name
> FROM document doc left outer join folder f ON doc.folder_id = f.id
> WHERE doc.folder_id is not null OR doc.owner_id = 1;
>
> QUERY PLAN
>
> -----------------------------------------------------------------------------------------------------------------------------
> Nested Loop Left Join (cost=0.15..13.77 rows=4 width=76) (actual
> time=0.031..0.045 rows=3 loops=1)
> -> Seq Scan on document doc (cost=0.00..1.05 rows=4 width=44) (actual
> time=0.012..0.018 rows=3 loops=1)
> Filter: ((folder_id IS NOT NULL) OR (owner_id = 1))
> Rows Removed by Filter: 1
> -> Index Scan using folder_pkey on folder f (cost=0.15..3.17 rows=1
> width=36) (actual time=0.005..0.006 rows=1 loops=3)
> Index Cond: (doc.folder_id = id)
> Planning time: 0.267 ms
> Execution time: 0.094 ms
> (8 rows)
>
>
> Is the a way to write a query which uses an index efficiently for such a
> schema?
>
> I'd like to eliminate the Filter: ((folder_id IS NOT NULL) OR (owner_id =
> 1)) and rather have "index cond" insted, is that possible?
>
your example is partially broken - ANALYZE and hashjoin and seqscan
penalization are missing - index scan is not used due too small table sizes
I tested 9.5, probably same as 9.4 and there indexes are used
postgres=# set enable_hashjoin to off;
SET
Time: 0.473 ms
postgres=# set enable_seqscan to off;
SET
Time: 0.904 ms
postgres=# explain select f.id, f.name, doc.id, doc.owner_id, doc.name
FROM document doc left outer join folder f ON doc.folder_id = f.id
WHERE doc.folder_id is not null OR doc.owner_id = 1;
QUERY
PLAN
────────────────────────────────────────────────────────────────────────────────────────────────
Merge Left Join (cost=0.26..24.38 rows=3 width=32)
Merge Cond: (doc.folder_id = f.id)
-> Index Scan using document_folder_idx on document doc
(cost=0.13..12.20 rows=3 width=23)
Filter: ((folder_id IS NOT NULL) OR (owner_id = 1))
-> Index Scan using folder_pkey on folder f (cost=0.13..12.16 rows=2
width=13)
Planning time: 0.663 ms
(6 rows)
default 9.2, 9.3, ...
postgres=# explain select f.id, f.name, doc.id, doc.owner_id, doc.name
FROM document doc left outer join folder f ON doc.folder_id = f.id
WHERE doc.folder_id is not null OR doc.owner_id = 1;
QUERY PLAN
─────────────────────────────────────────────────────────────────────
Hash Left Join (cost=1.04..2.12 rows=3 width=32)
Hash Cond: (doc.folder_id = f.id)
-> Seq Scan on document doc (cost=0.00..1.05 rows=3 width=23)
Filter: ((folder_id IS NOT NULL) OR (owner_id = 1))
-> Hash (cost=1.02..1.02 rows=2 width=13)
-> Seq Scan on folder f (cost=0.00..1.02 rows=2 width=13)
(6 rows)
and 9.2 after hashjoin and indexscan penalization
postgres=# explain select f.id, f.name, doc.id, doc.owner_id, doc.name
FROM document doc left outer join folder f ON doc.folder_id = f.id
WHERE doc.folder_id is not null OR doc.owner_id = 1;
QUERY
PLAN
────────────────────────────────────────────────────────────────────────────────────────────────
Merge Left Join (cost=0.00..24.62 rows=3 width=32)
Merge Cond: (doc.folder_id = f.id)
-> Index Scan using document_folder_idx on document doc
(cost=0.00..12.32 rows=3 width=23)
Filter: ((folder_id IS NOT NULL) OR (owner_id = 1))
-> Index Scan using folder_pkey on folder f (cost=0.00..12.28 rows=2
width=13)
(5 rows)
Time: 2.258 ms
What is your PostgreSQL?
Regards
Pavel
P.S. ten years ago I had a similar issue - "OR" predikates can be replaced
by UNION
you can try:
SELECT * FROM
(SELECT * FROM doc
WHERE folder_id IS NOT NULL
UNION
SELECT * FROM doc
WHERE owner_id = 1) s
LEFT JOIN folder ON s.folder_id = folder.id
or some similar magic
select f.id, f.name, doc.id, doc.owner_id, doc.name
FROM document doc left outer join folder f ON doc.folder_id = f.id
WHERE doc.folder_id is not null OR doc.owner_id = 1;
>
> --
> *Andreas Joseph Krogh*
> CTO / Partner - Visena AS
> Mobile: +47 909 56 963
> andreas(at)visena(dot)com
> www.visena.com
> <https://www.visena.com>
>
From | Date | Subject | |
---|---|---|---|
Next Message | SENADIN | 2014-08-11 19:49:32 | Re: How to optimize WHERE column_a IS NOT NULL OR column_b = 'value' |
Previous Message | Adrian Klaver | 2014-08-11 19:01:20 | Re: function call |