Inconsistent query performance based on relation hit frequency

From: Laura Hausmann <laura(at)hausmann(dot)dev>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Inconsistent query performance based on relation hit frequency
Date: 2024-06-27 00:50:31
Message-ID: CAKGAs9m4tshy9Nhuxpm6kNEVZH4EHfvvtD=arkS2xH0HOd-2Gw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Heya, I hope the title is somewhat descriptive. I'm working on a
decentralized social media platform and have encountered the following
performance issue/quirk, and would like to ask for input, since I'm not
sure I missed anything.

I'm running PostgreSQL 16.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC)
13.2.1 20230801, 64-bit, running on an Arch Linux box with 128GB of RAM &
an 8c16t Ryzen 3700X CPU. Disk is a NVME RAID0.

Postgres configuration: https://paste.depesz.com/s/iTv

I'm using autovacuum defaults & am running a manual VACUUM ANALYZE on the
entire database nightly.

The relevant database parts consist of a table with posts (note), a table
with users (user), and a table with follow relationships (following). The
query in question takes the most recent n (e.g. 50) posts, filtered by the
users follow relations.

The note table on my main production instance grows by about 200k entries
per week.

Schema & tuple counts: https://paste.depesz.com/s/cfI

Here's the shortest query I can reproduce the issue with:
https://paste.depesz.com/s/RoC
Specifically, it works well for users that follow a relatively large amount
of users (https://explain.depesz.com/s/tJnB) and is very slow for users
that follow a low amount of users / users that post infrequently (
https://explain.depesz.com/s/Mtyr)

From what I can tell, this is because this query causes postgres to scan
the note table from the bottom (most recent posts first), discarding
anything by users that are not followed.

Curiously, rewriting the query like this (https://paste.depesz.com/s/8rN)
causes the opposite problem, this query is fast for users with a low
following count (https://explain.depesz.com/s/yHAz#query) and slow for
users with a high following count (https://explain.depesz.com/s/1v6L,
https://explain.depesz.com/s/yg3N)

These numbers are even further apart (to the point of 10-30s query
timeouts) in the most extreme outlier cases I've observed, and on lower-end
hardware.

I've sidestepped the issue by running either of these queries based on a
heuristic that checks whether there are more than 250 matching posts in the
past 7 days, recomputed once per day for every user, but it feels more like
a hack than a proper solution.

I'm able to make the planner make a sensible decision in both cases by
setting enable_sort = off, but that tanks performance for the rest of my
application, is even more of a hack, and doesn't seem to work in all cases.

I've been able to reproduce this issue with mock data (
https://paste.depesz.com/s/CnY) though it's not generating quite the same
query plans and is behaving a bit differently.

I'd appreciate any and all input on the situation. If I've left out any
information that would be useful in figuring this out, please tell me.

Thanks in advance,
Laura Hausmann

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Andrei Lepikhov 2024-06-27 10:31:36 Re: Inconsistent query performance based on relation hit frequency
Previous Message Thomas Simpson 2024-06-24 22:28:13 Row level security