From: | "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> |
---|---|
To: | "Glen M(dot) Witherington" <glen(at)fea(dot)st> |
Cc: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Efficient sorting the results of a join, without denormalization |
Date: | 2015-05-31 04:33:13 |
Message-ID: | CAKFQuwb7_DegQ2KXBA6BsEz7ZC1iD5LjVm5mWHP+hQvMKsdM6A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Saturday, May 30, 2015, Glen M. Witherington <glen(at)fea(dot)st> wrote:
> Sorry about the horrendous subject, let me explain by example:
>
> Let's take this schema:
>
>
> ```
> CREATE TABLE a (
> id bigserial PRIMARY KEY,
> created_at timestamp with time zone NOT NULL DEFAULT NOW()
> );
>
> CREATE TABLE b(
> id bigserial PRIMARY KEY,
> a_id bigint NOT NULL REFERENCES a(id),
> created_at timestamp with time zone NOT NULL DEFAULT NOW()
> );
>
> CREATE TABLE c(
> id bigserial PRIMARY KEY,
> b_id bigint NOT NULL REFERENCES b(id),
> created_at timestamp with time zone NOT NULL DEFAULT NOW()
> );
> ```
>
> And let's fill it up with some dummy data, that roughly matches the
> distribution of mine:
>
> ```
> INSERT INTO a SELECT FROM generate_series(1, 5);
> INSERT INTO b(a_id) SELECT (i % 5) + 1 FROM generate_series(1, 100) i;
> INSERT INTO c(b_id) SELECT (trunc(random() * 100)+1) FROM
> generate_series(1, 1000000);
> ```
>
>
> And here's the query I want to do, efficiently:
>
> ````
> SELECT * FROM c
> JOIN b ON b.id = c.b_id
> JOIN a ON a.id = b.a_id
> WHERE a.id = 3
> ORDER BY b.created_at DESC
> LIMIT 10
> ```
>
> There seems to simply be no index I can put on the data, that allows me
> to run this query efficiently. Four hours of playing with this, the only
> solution I can see is, normalizing table `c` by adding a field "b's
> a_id" and then creating an index on (bs_a_id, created_at).
>
> But surely there's a better solution?
>
>
This is one problem with using made up surrogate keys...
The PK of A is a component of both the PK of B and the PK of C but you
throw that information away by using serial fields for PKs instead. You
should have unique indexes on B and C that incorporate the ID from A and
then indeed you will end up with a join sequence that can be executed
against efficiently.
All that said you really should put indexes on the foreign keys...
I haven't run the code to see the actual plan....
David J.
From | Date | Subject | |
---|---|---|---|
Next Message | Glen M. Witherington | 2015-05-31 04:43:25 | Re: Efficient sorting the results of a join, without denormalization |
Previous Message | Maxim Boguk | 2015-05-31 03:22:38 | Curious case of huge simple btree indexes bloat. |