Efficiently searching for the most recent rows where a column matches any result from a different query

From: <mkslaf(at)keemail(dot)me>
To: <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Efficiently searching for the most recent rows where a column matches any result from a different query
Date: 2018-02-13 13:28:07
Message-ID: L5EKys7--3-0@keemail.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello,
I have the following schema:
    CREATE TABLE users (        id   BIGSERIAL PRIMARY KEY,        name TEXT      NOT NULL UNIQUE    );        CREATE TABLE friends (        user_id        BIGINT NOT NULL REFERENCES users,        friend_user_id BIGINT NOT NULL REFERENCES users,        UNIQUE (user_id, friend_user_id)    );        CREATE TABLE posts (        id      BIGSERIAL PRIMARY KEY,        user_id BIGINT    NOT NULL REFERENCES users,        content TEXT      NOT NULL    );    CREATE INDEX posts_user_id_id_index ON posts(user_id, id);
Each user can unilaterally follow any number of friends. The posts table has a large number of rows and is rapidly growing.
My goal is to retrieve the 10 most recent posts of a user's friends. This query gives the correct result, but is inefficient:
    SELECT posts.id, users.name, posts.content    FROM posts JOIN users ON posts.user_id = users.id    WHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE user_id = 1)    ORDER BY posts.id DESC LIMIT 10;
If the user's friends have recently posted, the query is still reasonably fast (https://explain.depesz.com/s/6ykR) But if the user's friends haven't recently posted or the user has no friends, it quickly deteriorates (https://explain.depesz.com/s/OnoG)
If I match only a single post author (e.g. WHERE posts.user_id = 5), Postgres uses the index posts_user_id_id_index. But if I use IN, the index doesn't appear to be used at all.
How can I get these results more efficiently?
I've uploaded the schema and the queries I've tried to dbfiddle at http://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=cf1489b7f6d53c3fe0b55ed7ccbad1f0. The output of "SELECT version()" is "PostgreSQL 9.6.5 on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit" for me.
Thank you in advance for any insights, pointers or suggestions you are able to give me.
Regards,Milo

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Hellmuth Vargas 2018-02-13 21:13:13 Re: Efficiently searching for the most recent rows where a column matches any result from a different query
Previous Message Andrew Kerber 2018-02-10 14:58:41 Re: OT: Performance of VM