Quick Links

Re: optimizing a query

From:	Jonathan Vanasco <postgres(at)2xlp(dot)com>
To:	David G(dot) Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc:	Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, PostgreSQL mailing lists <pgsql-general(at)postgresql(dot)org>
Subject:	Re: optimizing a query
Date:	2016-06-22 00:07:11
Message-ID:	9D705980-D6C4-4B37-B101-9BA6CE5F169E@2xlp.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Jun 21, 2016, at 6:55 PM, David G. Johnston wrote:

> Aside from the name these indexes are identical...

sorry. tired eyes copy/pasting between windows and trying to 'average' out 40 similar queries.

> These two items combined reduce the desirability of diagnosing this...it doesn't seem like you've faithfully recreated the scenario for us to evaluate.
>
> Your post is also not self-contained and you haven't provided the actual EXPLAINs you are getting.

I played around with some more indexes, creating and disabling them on one specific query
Eventually i found some index formats that didn't pull in the whole table.
They gave approximately the same results as the other selects, with some differences in reporting. the heap scan on the table was negligible. the big hit was off the outer hash join.
the formatting in explain made a negligible check look like it was the root issue

CREATE TABLE t_a (id SERIAL PRIMARY KEY,
col_1 INT NOT NULL,
col_2 BOOLEAN DEFAULT NULL
);
CREATE INDEX test_idx__t_a_col1_col2__v1 on t_a (col_1) WHERE col_2 IS NOT FALSE;
CREATE INDEX test_idx__t_a_col1_col2__v2 on t_a (col_1, id) WHERE col_2 IS NOT FALSE;
CREATE INDEX test_idx__t_a_col1_col2__v3 on t_a (id, col_1) WHERE col_2 IS NOT FALSE;
CREATE INDEX test_idx__t_a_col1_col2__v4 on t_a (id, col_1, col_2) WHERE col_2 IS NOT FALSE;
CREATE INDEX test_idx__t_a_col1_col2__v5 on t_a (col_1, col_2) WHERE col_2 IS NOT FALSE;

CREATE TABLE t_b (id SERIAL PRIMARY KEY,
col_1 INT NOT NULL,
col_2 BOOLEAN DEFAULT NULL
);
CREATE TABLE t_a2b (a_id INT NOT NULL REFERENCES t_a(id),
b_id INT NOT NULL REFERENCES t_b(id),
col_a INT NOT NULL,
PRIMARY KEY (a_id, b_id)
);

EXPLAIN ANALYZE
SELECT t_a2b.b_id AS t_a2b_b_id,
count(t_a2b.b_id) AS counted
FROM t_a2b
JOIN t_a ON t_a2b.a_id = t_a.id
WHERE t_a.col_1 = 730
AND t_a2b.col_a = 1
AND (t_a.col_2 IS NOT False)
GROUP BY t_a2b.b_id
ORDER BY counted DESC,
t_a2b.b_id ASC
LIMIT 25
OFFSET 0
;

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=270851.55..270851.62 rows=25 width=4) (actual time=1259.950..1259.953 rows=25 loops=1)
-> Sort (cost=270851.55..270863.43 rows=4750 width=4) (actual time=1259.945..1259.945 rows=25 loops=1)
Sort Key: (count(t_a2b.b_id)) DESC, t_a2b.b_id
Sort Method: top-N heapsort Memory: 26kB
-> HashAggregate (cost=270670.01..270717.51 rows=4750 width=4) (actual time=1259.430..1259.769 rows=1231 loops=1)
Group Key: t_a2b.b_id
-> Hash Join (cost=171148.45..270516.71 rows=30660 width=4) (actual time=107.662..1230.481 rows=124871 loops=1)
Hash Cond: (t_a2b.a_id = t_a.id)
-> Seq Scan on t_a2b (cost=0.00..89741.18 rows=2485464 width=8) (actual time=0.011..661.978 rows=2492783 loops=1)
Filter: (col_a = 1)
Rows Removed by Filter: 2260712
-> Hash (cost=170446.87..170446.87 rows=56126 width=4) (actual time=107.409..107.409 rows=48909 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 2232kB
-> Bitmap Heap Scan on t_a (cost=1055.41..170446.87 rows=56126 width=4) (actual time=18.243..94.470 rows=48909 loops=1)
Recheck Cond: ((col_1 = 730) AND (col_2 IS NOT FALSE))
Heap Blocks: exact=43972
-> Bitmap Index Scan on test_idx__t_a_col1_col2__v2 (cost=0.00..1041.38 rows=56126 width=0) (actual time=8.661..8.661 rows=48909 loops=1)
Index Cond: (col_1 = 730)
Planning time: 0.796 ms
Execution time: 1260.092 ms

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=208239.59..208239.65 rows=25 width=4) (actual time=1337.739..1337.743 rows=25 loops=1)
-> Sort (cost=208239.59..208251.47 rows=4750 width=4) (actual time=1337.737..1337.739 rows=25 loops=1)
Sort Key: (count(t_a2b.b_id)) DESC, t_a2b.b_id
Sort Method: top-N heapsort Memory: 26kB
-> HashAggregate (cost=208058.05..208105.55 rows=4750 width=4) (actual time=1337.183..1337.556 rows=1231 loops=1)
Group Key: t_a2b.b_id
-> Hash Join (cost=108628.33..207935.37 rows=24537 width=4) (actual time=173.116..1306.910 rows=124871 loops=1)
Hash Cond: (t_a2b.a_id = t_a.id)
-> Seq Scan on t_a2b (cost=0.00..89741.18 rows=2485464 width=8) (actual time=0.010..669.616 rows=2492783 loops=1)
Filter: (col_a = 1)
Rows Removed by Filter: 2260712
-> Hash (cost=108066.87..108066.87 rows=44917 width=4) (actual time=172.884..172.884 rows=48909 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 2232kB
-> Index Only Scan using test_idx__t_a_col1_col2__v4 on t_a (cost=0.43..108066.87 rows=44917 width=4) (actual time=0.031..160.088 rows=48909 loops=1)
Index Cond: (col_1 = 730)
Heap Fetches: 2426
Planning time: 0.769 ms
Execution time: 1337.861 ms

In response to

Re: optimizing a query at 2016-06-21 22:55:51 from David G. Johnston

Responses

Re: optimizing a query at 2016-06-22 08:25:59 from Erik Gustafson

Browse pgsql-general by date

	From	Date	Subject
Next Message	Job	2016-06-22 07:38:14	R: Vacuum full: alternatives?
Previous Message	Tim Smith	2016-06-21 23:03:55	Re: Help needed structuring Postgresql correlation query