Re: Slow join over three tables

From: Alban Hertroys <haramrae(at)gmail(dot)com>
To: Tim van der Linden <tim(at)shisaa(dot)jp>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Subject: Re: Slow join over three tables
Date: 2016-04-27 11:48:06
Message-ID: 792DDB09-CCAC-41EB-9127-0425E84F06B6@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


> On 27 Apr 2016, at 4:09, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>
> On 27 April 2016 at 11:27, Tim van der Linden <tim(at)shisaa(dot)jp> wrote:

>> The query:
>>
>> SELECT r.id, r.age, r.gender, r.created, a.adverse, d.drug
>> FROM reports r
>> JOIN report_drugs d ON d.rid = r.id
>> JOIN report_adverses a ON a.rid = r.id
>> WHERE a.adverse = ANY (ARRAY['back pain - nonspecific', 'nonspecific back pain', 'back pain'])
>> AND d.drug = ANY (ARRAY[359, 360, 361, 362, 363]) ORDER BY r.created;
>>

> Likely the only way to speed this up would be to create indexes;
>
> create index on reports (id, age, gender, created);

Since you're not using age and gender in this (particular) query until the rows are combined into a result set already, it doesn't make a whole lot of sense to add them to the index. Moreover, since your'e ordering by created, I'd at least put that directly after id:

create index on reports (id, created);

> create index on report_adverses (rid, adverse);

In this case, you're using the values in adverse to filter relevant rid's for the FK join, so you might be better off with the inverse of above index:
create index on report_adverses (adverse, rid);
create index on report_drugs (drug, rid);

Do these tables have a primary key and can that be used here?

I hope I'm not being superfluous here, but don't forget to vacuum analyze after creating those indexes. Eventually autovacuum will do that for you, but you could otherwise be running tests to verify the impact of adding those indexes before autovacuum gets around to it.

Finally, perhaps it's more efficient to weed out all unrelated drugs and adverses before relating them to reports, something like:

SELECT r.id, r.age, r.gender, r.created, x.adverse, x.drug
FROM reports r
JOIN (
SELECT d.rid, a.adverse, d.drug
FROM report_drugs d
JOIN report_adverses a ON a.rid = d.rid
WHERE a.adverse = ANY (ARRAY['back pain - nonspecific', 'nonspecific back pain', 'back pain'])
AND d.drug = ANY (ARRAY[359, 360, 361, 362, 363])
) x ON x.rid = r.id
ORDER BY r.created;

Looking at the cardinality of your tables that does seem a bit unlikely though. Still, worth a shot...

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tim van der Linden 2016-04-27 13:21:52 Re: Slow join over three tables
Previous Message Tim van der Linden 2016-04-27 10:49:26 Re: Slow join over three tables