Re: Most efficient way of querying M 'related' tables where N out of M may contain the key

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Stephane Bailliez <sbailliez(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Most efficient way of querying M 'related' tables where N out of M may contain the key
Date: 2015-08-21 12:24:37
Message-ID: CAKFQuwZhfAGJrhcQ0F-F9VHqyX_CAaSku5Dk8jm+YWh9hWWYDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, Aug 21, 2015 at 8:07 AM, Stephane Bailliez <sbailliez(at)gmail(dot)com>
wrote:

>
> On Thu, Aug 20, 2015 at 8:19 PM, David G. Johnston <
> david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
>
>>
>> ​SELECT [...]
>> FROM (SELECT reference_id, [...] FROM table_where_referenced_id_is_a_pk
>> WHERE reference_id EXISTS/IN/JOIN)​
>>
>> ​src
>> ​LEFT JOIN type1 USING (reference_id)
>> LEFT JOIN type2 USING (reference_id)
>> [...]
>>
>
>
​Place ^ in a CTE named (find_all)​

> there are no tables where reference_id is a pk, I could create one or do :
> select reference_id from ( values (..), (...), (...) .... )
>
> the tricky part with the join (and where I was not clear about it in my
> original description) is that a reference_id can match in multiple tables
> (eg. it can be a fk in type1 and type2), so it then becomes a bit harder to
> collect all the common attributes and 'types' when doing joins like this.
>
> For example let's assume there is a group_id to be be retrieved among all
> tables as a common attribute:
>
> if reference_id was existing only in one table, I could do
> coalesce(type1.group_id, ... type5.group_id) as group_id in the main select
> however that would not work in this case.
>
>
​WITH find_all (reference_id, type_identifier, type_id) AS ( ... )
SELECT ​type_identifier, array_agg(reference_id), array_agg(type_id)
FROM find_all
WHERE type_identifier IS NOT NULL
GROUP BY type_identifier

​find_all will return at least one row, possibly empty if no matches are
present, and will return multiple rows if more than one matches. You can
use array_agg as shown, or play around with custom composite types, ​or
even build a JSON document.

David J.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Genc 2015-08-21 12:48:37 Performance bottleneck due to array manipulation
Previous Message Stephane Bailliez 2015-08-21 12:07:26 Re: Most efficient way of querying M 'related' tables where N out of M may contain the key