Re: optimizing a query

From: Erik Gustafson <gustafson(dot)erik(at)gmail(dot)com>
To: Jonathan Vanasco <postgres(at)2xlp(dot)com>, PostgreSQL mailing lists <pgsql-general(at)postgresql(dot)org>
Subject: Re: optimizing a query
Date: 2016-06-22 08:25:59
Message-ID: CABrb_G_7m9J3_KdLpURYhsTWKWj0Jxc-w8EAD9LyrRyrPKrgnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

don't you want an index on t_a2b.col_a, maybe partial where col_a=1 ?

On Wed, Jun 22, 2016 at 2:07 AM, Jonathan Vanasco <postgres(at)2xlp(dot)com> wrote:

>
> On Jun 21, 2016, at 6:55 PM, David G. Johnston wrote:
>
> ​Aside from the name these indexes are identical...​
>
>
> sorry. tired eyes copy/pasting between windows and trying to 'average' out
> 40 similar queries.
>
> ​These two items combined reduce the desirability of diagnosing this...it
> doesn't seem like you've faithfully recreated the scenario for us to
> evaluate.
>
> Your post is also not self-contained and you haven't provided the actual
> EXPLAINs you are getting.
>
>
> I played around with some more indexes, creating and disabling them on one
> specific query
> Eventually i found some index formats that didn't pull in the whole table.
> They gave approximately the same results as the other selects, with some
> differences in reporting. the heap scan on the table was negligible. the
> big hit was off the outer hash join.
> the formatting in explain made a negligible check look like it was the
> root issue
>
>
> CREATE TABLE t_a (id SERIAL PRIMARY KEY,
> col_1 INT NOT NULL,
> col_2 BOOLEAN DEFAULT NULL
> );
> CREATE INDEX test_idx__t_a_col1_col2__v1 on t_a (col_1) WHERE col_2 IS NOT
> FALSE;
> CREATE INDEX test_idx__t_a_col1_col2__v2 on t_a (col_1, id) WHERE col_2 IS
> NOT FALSE;
> CREATE INDEX test_idx__t_a_col1_col2__v3 on t_a (id, col_1) WHERE col_2 IS
> NOT FALSE;
> CREATE INDEX test_idx__t_a_col1_col2__v4 on t_a (id, col_1, col_2) WHERE
> col_2 IS NOT FALSE;
> CREATE INDEX test_idx__t_a_col1_col2__v5 on t_a (col_1, col_2) WHERE col_2
> IS NOT FALSE;
>
> CREATE TABLE t_b (id SERIAL PRIMARY KEY,
> col_1 INT NOT NULL,
> col_2 BOOLEAN DEFAULT NULL
> );
> CREATE TABLE t_a2b (a_id INT NOT NULL REFERENCES t_a(id),
> b_id INT NOT NULL REFERENCES t_b(id),
> col_a INT NOT NULL,
> PRIMARY KEY (a_id, b_id)
> );
>
> EXPLAIN ANALYZE
> SELECT t_a2b.b_id AS t_a2b_b_id,
> count(t_a2b.b_id) AS counted
> FROM t_a2b
> JOIN t_a ON t_a2b.a_id = t_a.id
> WHERE t_a.col_1 = 730
> AND t_a2b.col_a = 1
> AND (t_a.col_2 IS NOT False)
> GROUP BY t_a2b.b_id
> ORDER BY counted DESC,
> t_a2b.b_id ASC
> LIMIT 25
> OFFSET 0
> ;
>
>
>
> QUERY PLAN
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=270851.55..270851.62 rows=25 width=4) (actual
> time=1259.950..1259.953 rows=25 loops=1)
> -> Sort (cost=270851.55..270863.43 rows=4750 width=4) (actual
> time=1259.945..1259.945 rows=25 loops=1)
> Sort Key: (count(t_a2b.b_id)) DESC, t_a2b.b_id
> Sort Method: top-N heapsort Memory: 26kB
> -> HashAggregate (cost=270670.01..270717.51 rows=4750 width=4)
> (actual time=1259.430..1259.769 rows=1231 loops=1)
> Group Key: t_a2b.b_id
> -> Hash Join (cost=171148.45..270516.71 rows=30660
> width=4) (actual time=107.662..1230.481 rows=124871 loops=1)
> Hash Cond: (t_a2b.a_id = t_a.id)
> -> Seq Scan on t_a2b (cost=0.00..89741.18
> rows=2485464 width=8) (actual time=0.011..661.978 rows=2492783 loops=1)
> Filter: (col_a = 1)
> Rows Removed by Filter: 2260712
> -> Hash (cost=170446.87..170446.87 rows=56126
> width=4) (actual time=107.409..107.409 rows=48909 loops=1)
> Buckets: 65536 Batches: 1 Memory Usage: 2232kB
> -> Bitmap Heap Scan on t_a
> (cost=1055.41..170446.87 rows=56126 width=4) (actual time=18.243..94.470
> rows=48909 loops=1)
> Recheck Cond: ((col_1 = 730) AND (col_2
> IS NOT FALSE))
> Heap Blocks: exact=43972
> -> Bitmap Index Scan on
> test_idx__t_a_col1_col2__v2 (cost=0.00..1041.38 rows=56126 width=0)
> (actual time=8.661..8.661 rows=48909 loops=1)
> Index Cond: (col_1 = 730)
> Planning time: 0.796 ms
> Execution time: 1260.092 ms
>
>
> QUERY PLAN
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=208239.59..208239.65 rows=25 width=4) (actual
> time=1337.739..1337.743 rows=25 loops=1)
> -> Sort (cost=208239.59..208251.47 rows=4750 width=4) (actual
> time=1337.737..1337.739 rows=25 loops=1)
> Sort Key: (count(t_a2b.b_id)) DESC, t_a2b.b_id
> Sort Method: top-N heapsort Memory: 26kB
> -> HashAggregate (cost=208058.05..208105.55 rows=4750 width=4)
> (actual time=1337.183..1337.556 rows=1231 loops=1)
> Group Key: t_a2b.b_id
> -> Hash Join (cost=108628.33..207935.37 rows=24537
> width=4) (actual time=173.116..1306.910 rows=124871 loops=1)
> Hash Cond: (t_a2b.a_id = t_a.id)
> -> Seq Scan on t_a2b (cost=0.00..89741.18
> rows=2485464 width=8) (actual time=0.010..669.616 rows=2492783 loops=1)
> Filter: (col_a = 1)
> Rows Removed by Filter: 2260712
> -> Hash (cost=108066.87..108066.87 rows=44917
> width=4) (actual time=172.884..172.884 rows=48909 loops=1)
> Buckets: 65536 Batches: 1 Memory Usage: 2232kB
> -> Index Only Scan using
> test_idx__t_a_col1_col2__v4 on t_a (cost=0.43..108066.87 rows=44917
> width=4) (actual time=0.031..160.088 rows=48909 loops=1)
> Index Cond: (col_1 = 730)
> Heap Fetches: 2426
> Planning time: 0.769 ms
> Execution time: 1337.861 ms
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Vlad Arkhipov 2016-06-22 09:10:04 Protect a table against concurrent data changes while allowing to vacuum it
Previous Message Job 2016-06-22 07:38:14 R: Vacuum full: alternatives?