From: | "Glen M(dot) Witherington" <glen(at)fea(dot)st> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Efficient sorting the results of a join, without denormalization |
Date: | 2015-05-31 03:12:34 |
Message-ID: | 1433041954.1988470.282498121.5C7F9C07@webmail.messagingengine.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Sorry about the horrendous subject, let me explain by example:
Let's take this schema:
```
CREATE TABLE a (
id bigserial PRIMARY KEY,
created_at timestamp with time zone NOT NULL DEFAULT NOW()
);
CREATE TABLE b(
id bigserial PRIMARY KEY,
a_id bigint NOT NULL REFERENCES a(id),
created_at timestamp with time zone NOT NULL DEFAULT NOW()
);
CREATE TABLE c(
id bigserial PRIMARY KEY,
b_id bigint NOT NULL REFERENCES b(id),
created_at timestamp with time zone NOT NULL DEFAULT NOW()
);
```
And let's fill it up with some dummy data, that roughly matches the
distribution of mine:
```
INSERT INTO a SELECT FROM generate_series(1, 5);
INSERT INTO b(a_id) SELECT (i % 5) + 1 FROM generate_series(1, 100) i;
INSERT INTO c(b_id) SELECT (trunc(random() * 100)+1) FROM
generate_series(1, 1000000);
```
And here's the query I want to do, efficiently:
````
SELECT * FROM c
JOIN b ON b.id = c.b_id
JOIN a ON a.id = b.a_id
WHERE a.id = 3
ORDER BY b.created_at DESC
LIMIT 10
```
There seems to simply be no index I can put on the data, that allows me
to run this query efficiently. Four hours of playing with this, the only
solution I can see is, normalizing table `c` by adding a field "b's
a_id" and then creating an index on (bs_a_id, created_at).
But surely there's a better solution?
From | Date | Subject | |
---|---|---|---|
Next Message | Maxim Boguk | 2015-05-31 03:22:38 | Curious case of huge simple btree indexes bloat. |
Previous Message | Andres Freund | 2015-05-31 00:55:55 | Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 |