From: | Hellmuth Vargas <hivs77(at)gmail(dot)com> |
---|---|
To: | mkslaf(at)keemail(dot)me |
Cc: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: Efficiently searching for the most recent rows where a column matches any result from a different query |
Date: | 2018-02-13 21:13:13 |
Message-ID: | CAN3Qy4q4d7OK9O0P1uyfUR=J9jQtZwM6f1OjhfAbM4HDGMp5NQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hello:
EXPLAIN (ANALYZE, BUFFERS)
select * from (
SELECT posts.id, users.name, posts.content
FROM posts JOIN users ON posts.user_id = users.id
WHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE user_id =
1)
ORDER BY posts.id DESC
) as a
ORDER BY a.id DESC
LIMIT 10;
------
EXPLAIN (ANALYZE, BUFFERS)
select * from (
SELECT posts.id, users.name, posts.content
FROM posts JOIN users ON posts.user_id = users.id
WHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE user_id =
2)
ORDER BY posts.id DESC
) as a
ORDER BY a.id DESC
LIMIT 10;
2018-02-13 8:28 GMT-05:00 <mkslaf(at)keemail(dot)me>:
> Hello,
>
> I have the following schema:
>
> CREATE TABLE users (
> id BIGSERIAL PRIMARY KEY,
> name TEXT NOT NULL UNIQUE
> );
>
> CREATE TABLE friends (
> user_id BIGINT NOT NULL REFERENCES users,
> friend_user_id BIGINT NOT NULL REFERENCES users,
> UNIQUE (user_id, friend_user_id)
> );
>
> CREATE TABLE posts (
> id BIGSERIAL PRIMARY KEY,
> user_id BIGINT NOT NULL REFERENCES users,
> content TEXT NOT NULL
> );
> CREATE INDEX posts_user_id_id_index ON posts(user_id, id);
>
> Each user can unilaterally follow any number of friends. The posts table
> has a large number of rows and is rapidly growing.
>
> My goal is to retrieve the 10 most recent posts of a user's friends. This
> query gives the correct result, but is inefficient:
>
> SELECT posts.id, users.name, posts.content
> FROM posts JOIN users ON posts.user_id = users.id
> WHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE
> user_id = 1)
> ORDER BY posts.id DESC LIMIT 10;
>
> If the user's friends have recently posted, the query is still reasonably
> fast (https://explain.depesz.com/s/6ykR) But if the user's friends
> haven't recently posted or the user has no friends, it quickly deteriorates
> (https://explain.depesz.com/s/OnoG)
>
> If I match only a single post author (e.g. WHERE posts.user_id = 5),
> Postgres uses the index posts_user_id_id_index. But if I use IN, the index
> doesn't appear to be used at all.
>
> How can I get these results more efficiently?
>
> I've uploaded the schema and the queries I've tried to dbfiddle at
> http://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=
> cf1489b7f6d53c3fe0b55ed7ccbad1f0. The output of "SELECT version()" is
> "PostgreSQL 9.6.5 on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10)
> 4.9.2, 64-bit" for me.
>
> Thank you in advance for any insights, pointers or suggestions you are
> able to give me.
>
> Regards,
> Milo
>
--
Cordialmente,
Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por Internet
Oracle Database 10g Administrator Certified Associate
EnterpriseDB Certified PostgreSQL 9.3 Associate
From | Date | Subject | |
---|---|---|---|
Next Message | Gunnar "Nick" Bluth | 2018-02-13 21:15:31 | Re: Details after Load Peak was: OT: Performance of VM |
Previous Message | mkslaf | 2018-02-13 13:28:07 | Efficiently searching for the most recent rows where a column matches any result from a different query |