optimizing a query

From: Jonathan Vanasco <postgres(at)2xlp(dot)com>
To: PostgreSQL mailing lists <pgsql-general(at)postgresql(dot)org>
Subject: optimizing a query
Date: 2016-06-21 22:33:50
Message-ID: 92E3E3EB-8CD7-4950-B51A-33EEDA84E765@2xlp.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I have a handful of queries in the following general form that I can't seem to optimize any further (same results on 9.3, 9.4, 9.5)

I'm wondering if anyone might have a suggestion, or if they're done.

The relevant table structure:

t_a2b
a_id INT references t_a(id)
b_id INT references t_b(id)
col_a

t_a
id INT
col_1 INT
col_2 BOOL

The selects query the association table (t_a2b) and join in a related table (t_a) for some filtering.

In effort of simplifying the work, I've created indexes on t_a that have all the related columns.

CREATE INDEX test_idx ON t_a(col_1, id) WHERE col_2 IS NOT FALSE;
CREATE INDEX test_idx__a ON t_a(col_1, id) WHERE col_2 IS NOT FALSE;

postgres will query test_idx__a first (yay!) but then does a bitmap heap scan on t_a, and uses the raw t_a for the hash join.

I don't actually need any information from t_a - it's just there for the filtering, and ideally postgres would just use the index.

I thought this might have been from using a partial index, but the same results happen with a full index. I just can't seem to avoid this hash join against the full table.

anyone have a suggestion?

example query

SELECT t_a2b.b_id AS b_id,
count(t_a2b.b_id) AS counted
FROM t_a2b
WHERE
t_a2b.col_a = 1
AND
t_a.col_1 = 730
AND
t_a.col_2 IS NOT False
GROUP BY t_a2b.b_id
ORDER BY counted DESC,
t_a2b.b_id ASC

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2016-06-21 22:44:19 Re: optimizing a query
Previous Message Jonathan Vanasco 2016-06-21 21:45:13 Re: does timestamp precision affect storage size?