From: | Mathias Kunter <mathiaskunter(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #14107: Major query planner bug regarding subqueries and indices |
Date: | 2016-05-11 11:55:07 |
Message-ID: | fad91c29-5250-f856-2347-61122d4901ab@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Sorry for bumping this one more time, but I'd like to share some more
real-life performance test results of using ANY(ARRAY(...)) instead of
IN(...), hoping that you'd maybe still consider implementing such an
optimization into the query planner. Since the test results indicate
that the performance boost can really be massive on certain query types
(factor 1000), I think that it'd really be worth the work.
===== Test setup =====
The tables "mb.release" and "mb.release_group" both contain about 1.5
million rows of real data, taken from the MusicBrainz database, and are
of course properly indexed. All performance tests have been repeated a
few times to be comparable.
The test covers subqueries which return just a few rows and also
subqueries which return more than 100000 rows. The queries test the
performance of IN vs. ANY(ARRAY()) when used in different scenarios.
For reference, the full query plans of all used queries are linked below.
===== Tested queries =====
1) SELECT id FROM mb.release WHERE release_group IN (SELECT id FROM
mb.release_group WHERE name = 'Bear');
2) SELECT id FROM mb.release WHERE release_group IN (SELECT id FROM
mb.release_group WHERE name < 'Bear');
3) SELECT id FROM mb.release WHERE name = 'Tiger' OR release_group IN
(SELECT id FROM mb.release_group WHERE name = 'Bear');
4) SELECT id FROM mb.release WHERE name = 'Tiger' OR release_group IN
(SELECT id FROM mb.release_group WHERE name < 'Bear');
5) SELECT id FROM mb.release WHERE name = 'Tiger' AND release_group IN
(SELECT id FROM mb.release_group WHERE name = 'Bear');
6) SELECT id FROM mb.release WHERE name = 'Tiger' AND release_group IN
(SELECT id FROM mb.release_group WHERE name < 'Bear');
===== Test results =====
All numbers are given in milliseconds and show the total query time
(planning + execution).
-------------------------------------------
| Query | IN (...) | = ANY(ARRAY(...)) |
-------------------------------------------
| 1 | 0.7 | 0.4 |
| 2 | 6001.1 | 2517.8 |
| 3 | 711.3 | 0.5 |
| 4 | > 1000000.0 | 1962.6 |
| 5 | 0.8 | 0.5 |
| 6 | 0.9 | 492.7 |
-------------------------------------------
Note: Query 4 using the IN operator has been canceled after running for
more than 15 minutes.
===== Full query plans =====
For reference, all query plans of this performance test have been
recorded using EXPLAIN (ANALYZE, BUFFERS). Please find them at
http://pastebin.com/zymkbcSf
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2016-05-11 13:56:24 | Re: BUG #14107: Major query planner bug regarding subqueries and indices |
Previous Message | Hari Prasad | 2016-05-11 09:48:08 | Installation error/bug |