Partitioning update-heavy queue with hash partitions vs partial indexes

From: Dorian Hoxha <dorian(dot)hoxha(at)gmail(dot)com>
To: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Partitioning update-heavy queue with hash partitions vs partial indexes
Date: 2023-08-10 08:36:12
Message-ID: CANsFX04P_VXOhO19uPnQyo4vi67kw7q_Y3ZXYD8W5AYtCSG76g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi list,

I have a queue table with the schema:

```
create table queue(id bigserial primary key, view_time timestamp with
timezone not null);
create index queue_view_time ON queue(view_time ASC);
```

The most concurrent operation is:
```
UPDATE queue SET view_time=view_time+INTERVAL '60 seconds' WHERE id=(
SELECT id FROM queue WHERE view_time<=now() at time zone 'utc' ORDER BY
view_time ASC LIMIT 1 FOR UPDATE SKIP LOCKED
)
```
As you can imagine, with increased concurrency, this query will have to
read & skip a lot of locked+dead index entries, so taking a lot of cpu-time.

I'm assuming 10K+ queries/second will do the update above and actually
return a row.
You may think about how you'll maintain 10K connections, but you can
increase the limit, the queries being fast, use a connection pooler, use
auto-commit, etc.

--------------

Since most of the overhead is in the `queue_view_time` index, I thought of
partitioning just that with partial indexes and then querying the indexes
randomly. This is with 2 partitions:

```
create index queue_view_time_0 ON queue(view_time ASC) WHERE id%2=0;
create index queue_view_time_0 ON queue(view_time ASC) WHERE id%2=1;
```
Adding `where id%2=0` to the select query above and trying the partitions
randomly until I get a row or searched all partitions.

----------------
But looking at the docs
https://www.postgresql.org/docs/current/indexes-partial.html, it says:

> Do Not Use Partial Indexes as a Substitute for Partitioning
> While a search in this larger index might have to descend through a
couple more tree levels than a search in a smaller index, that's almost
certainly going to be cheaper than the planner effort needed to select the
appropriate one of the partial indexes. The core of the problem is that the
system does not understand the relationship among the partial indexes, and
will laboriously test each one to see if it's applicable to the current
query.

Would this be true in my case too?

Is it faster for the planner to select a correct partition(hash
partitioning on `id` column) instead of a correct partial index like in my
case? I don't think I'll need more than ~32 partitions/partial-indexes in
an extreme scenario.

Regards,
Dorian

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message David Rowley 2023-08-11 04:49:24 Re: Partitioning update-heavy queue with hash partitions vs partial indexes
Previous Message David G. Johnston 2023-08-10 03:33:58 Re: Function call very slow from JDBC/java but super fast from DBear