Re: Efficient sorting the results of a join, without denormalization

From: Francisco Olarte <folarte(at)peoplecall(dot)com>
To: "Glen M(dot) Witherington" <glen(at)fea(dot)st>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Efficient sorting the results of a join, without denormalization
Date: 2015-05-31 18:16:31
Message-ID: CA+bJJbzJA+3vnLt8w-kKcm0_gvHHjUs_DgkEqSBjzYC8v+ZcEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Glen:

On Sun, May 31, 2015 at 6:43 AM, Glen M. Witherington <glen(at)fea(dot)st> wrote:
> On Sat, May 30, 2015, at 11:33 PM, David G. Johnston wrote:
>> This is one problem with using made up surrogate keys...
>> The PK of A is a component of both the PK of B and the PK of C but you throw that information away by using serial fields for PKs instead. You should have unique indexes on B and C that incorporate the ID from A
>
> That is quite a strange schema, though isn't it? If you imagine it as
> emails:
>
> C = Emails
> B = Folder
> A = User
>
> Now you're suggesting that even though an email belongs to to a folder,
> which belongs to a user ... each email should also contain contain a
> reference to a user? I guess that's fine, but seems unideal from a
> redundancy perspective

It may seem, and be, unideal from a redundancy perspective, but keys
are more natural. It means you have user (Glen), folder (Glen, PGlist)
and message (Glen,PGlist,27), different from (Glen,Inbox,27) or (Glen,
PgList,28) or (Francisco,PgList,27) ( Where the 'tuples' I've printed
are the PK values ). This has a lot of advantages, which you pay for
in other ways, like redundancies, but having composite primary keys
sometimes work in your favor as you can express restrictions with the
relationships and build composite indexes for add hoc queries. In this
case ( an email database ), a serial could be used ( instead of the
name ) for the user and folder PK, but still have very fast, simple
queries from a MUA for things like 'select * from messages where
user_id = <Prefetched_id> and not read order by timestamp desc limit
100'. Also it will help catch things like mismatching folder ids, or
using the user id as folder id, which are easily made when all the
keys are synthetic and meaningless numbers.

As an example, I have a currency table, with it's serial key
currency_id, and a seller table, which sells just a currency and whose
pk is (currency_id+seller_id), and a rate table with rates
(currency_id, rate_id), and an allowed rates table ( to see which
rates a seller can use ), with primay key (currency_id, seller_id,
rate_id) and foreign keys (currency_id, seller_id) and (currency_id,
rate_id) ( it is more or less a classical example. The composite keys
guarantee I can only allow a seller to sell rates on her currency.

I can also, if needed, build unique indexes on any single id ( they
are all serials, as I have no other candidate keys ), if I need them,
but given the access patterns I normally have all of them, and things
like populating a drop box to allow new rates for a seller are very
easy.

Francisco Olarte.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bill Moran 2015-05-31 20:09:58 Re: Efficient sorting the results of a join, without denormalization
Previous Message Francisco Olarte 2015-05-31 17:42:46 Re: How to retrieve Comment text using SQL, not psql?