Unexpected result count from update statement on partitioned table

From: Craig McIlwee <craigm(at)vt(dot)edu>
To: pgsql-general(at)postgresql(dot)org
Subject: Unexpected result count from update statement on partitioned table
Date: 2020-12-17 17:21:45
Message-ID: CAGqBcTZJOBpX4kaP-5ySn27=4poFkzm7XN_ZktmuUrbZmwLreg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

Our application uses a queue-like table to assign tasks to users and this
has worked well for us for a few years. Now we are in the process of
adding some restrictions to which tasks a user can work on and that is
based on an attribute of each task that does not change for the task's
lifespan. Users may have access to work on one or more or types of tasks.
To improve query time when finding the set of tasks that we assign, we are
introducing partitioning into our task queue table. When assigning tasks,
we issue an update statement to mark the tasks as reserved using a subquery
that orders the tasks by age. With the introduction of partitioning, we
are seeing that the update statement affects more rows than expected. An
example query is:

---
update task_parent
set reserved = true
from (
select id
from task_parent
where reserved = false
and task_type = 1 or task_type = 2
order by task_timestamp
limit 50
for update skip locked) as sub
where sub.id = task_parent.id
returning task_parent.id
---

In the statement above, we have a subquery to limit the number of tasks to
50 yet the update statement sometimes returns more than 50 records. I have
narrowed this down to a small, reproducible example shown below. The first
time I run the update statement I get ~65 records, then typically ~53 the
next few runs, and then it starts consistently giving me 50 records after
that. Then if I bump the limit to 100 I will get more than 100 initially
and after several executions it starts to settle into always giving the
expected 100.

Below is the full setup that can be used to reproduce what I'm seeing. It
was initially observed on PostgreSQL 11.8 but I can also reproduce it on
13.0.

---
create table task_parent (
id bigint not null,
task_type smallint not null,
reserved boolean not null,
task_timestamp timestamp not null
) partition by list (task_type);

create table task_child_1
partition of task_parent for values in (1);

create table task_child_2
partition of task_parent for values in (2);

insert into task_parent
select
generate_series(1, 500000),
case when random() < 0.5 then 1 else 2 end,
false,
now() - (random() * '1 day'::interval);

create index task_parent_task_time_idx
on task_parent (task_timestamp);

update task_parent
set reserved = true
from (
select id
from task_parent
where reserved = false
and task_type = 1 or task_type = 2
order by task_timestamp
limit 50
for update skip locked) as sub
where sub.id = task_parent.id
returning task_parent.id;
---

A couple of interesting observations:
1) If I remove the order by clause I always get the expected number of
results
2) If I rewrite the query to use a CTE for the task IDs instead of a
subquery then I always get the expected number of results

At its surface, this seems like it could be a bug but maybe there is
something about this usage pattern that is known/expected to cause this
behavior. So that's the question - is this a bug that should be reported
to pgsql-bugs, or is this expected and if so, why?

Craig

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Gustavsson Mikael 2020-12-18 08:17:03 SV: SV: SV: Problem with ssl and psql in Postgresql 13
Previous Message Pavel Stehule 2020-12-17 16:58:22 Re: Raise exception without using plpgsql?