Most efficient way of querying M 'related' tables where N out of M may contain the key

From: Stephane Bailliez <sbailliez(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Most efficient way of querying M 'related' tables where N out of M may contain the key
Date: 2015-08-21 00:03:32
Message-ID: CACDwtRc5PaAd+6rZp9wsONd_7oa4-sVEA=xz=4a9f+DjeP3V1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Pretty bad subject description... but let me try to explain.

I'm trying to figure out what would be the most efficient way to query data
from multiple tables using a foreign key.

Right now the table design is such that I have multiple tables that share
some common information, and some specific information. (in OO world we
could see them as derived tables) For the sake of simplicity let's assume
there are only 5,

table type1(int id, varchar(24) reference_id, ....specific columns)
table type2(int id, varchar(24) reference_id, ....specific columns)
table type3(int id, varchar(24) reference_id, ....specific columns)
table type4(int id, varchar(24) reference_id, ....specific columns)
table type5(int id, varchar(24) reference_id, ....specific columns)

NB: you could imagine those 5 tables inheriting from a base_type table that
shares a few attributes.

I have a list of reference ids, those reference ids could be in any of
those 5 tables but I don't know in which one.

I want to most efficiently retrieve the data on N out of 5 relevant tables
but still want to query individually those 5 tables (for orm simplicity
reason).

So the idea is first to identify which tables I should query for.

The naive implementation would be to always query those 5 tables for the
list of reference ids, however 90% of the time the reference ids would only
be stored in one single table though. So 4/5th of the queries would then be
irrelevant.

what I initially came up with was doing a union of the tables such as:

SELECT 'type1', id FROM type5 WHERE reference_id IN (....)
UNION
SELECT 'type2', id FROM type4 WHERE reference_id IN (....)
UNION
...
SELECT 'type2', id FROM type3 WHERE reference_id IN (....)

then effectively figuring the list of which reference ids are in type1,
type2, type3, ...etc..

and then issuing the right select to the tables for the related reference
ids.

which means in best case scenario I would only do 2 queries instead of 5.
1 to retrieve the list of reference ids per 'type'
1 to retrieve the list of types with the corresponding reference ids

I'm trying to figure out if there is a more efficient way to retrieve this
information than doing a union across all tables (there can be a couple
hundreds reference ids to query for in the list)

I was thinking worse case scenario I could maintain this information in
another table via triggers to avoid doing this union, but that seems a bit
of a hammer solution initially and wondering if there is not something
simpler via joins that could be more performant.

Thanks for any suggestions.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message David G. Johnston 2015-08-21 00:19:22 Re: Most efficient way of querying M 'related' tables where N out of M may contain the key
Previous Message Graeme B. Bell 2015-08-20 08:26:52 Re: incredible surprise news from intel/micron right now...