Re: Unexpected result count from update statement on partitioned table

From: Ron <ronljohnsonjr(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Unexpected result count from update statement on partitioned table
Date: 2020-12-18 13:23:51
Message-ID: fcb781c5-9b1c-293c-f9d5-0a4709799d61@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Would (task_type in (1,2)) make any logical difference?

On 12/18/20 6:11 AM, Craig McIlwee wrote:
> Despite looking at this query on and off for a couple of days, it wasn't
> until seeing it in Lauenz's reply that I noticed  a logical issue with the
> query that changes things a bit.  There should be parenthesis around the
> task_type predicates, otherwise you end up getting reserved rows in the
> result set.  After looking at it more, I see that when the original query
> settles in to consistently returning the expected number of results they
> happen to be the exact same results each time.  Correcting the logical
> error in the query solves that and now consistently gives too many
> results, regardless of the number of executions.
>
> Corrected query (but still not working properly):
>
> update task_parent
> set reserved = true
> from (
>   select id
>   from task_parent
>   where reserved = false
>     and (task_type = 1 or task_type = 2)
>   order by task_timestamp
>   limit 50
>   for update skip locked) as sub
> where sub.id <http://sub.id> = task_parent.id <http://task_parent.id>
> returning *;
>
> Yes, this must be a bug:
>
> EXPLAIN (COSTS OFF) update task_parent
> set reserved = true
> from (
>   select id
>   from task_parent
>   where reserved = false
>     and task_type = 1 or task_type = 2
>   order by task_timestamp
>   limit 50
>   for update skip locked) as sub
> where sub.id <http://sub.id/> = task_parent.id <http://task_parent.id/>
> returning task_parent.id <http://task_parent.id/>;
>
>                                                         QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------
>  Update on task_parent
>    Update on task_child_1 task_parent_1
>    Update on task_child_2 task_parent_2
>    ->  Hash Join
>          Hash Cond: (task_parent_1.id <http://task_parent_1.id/> =
> sub.id <http://sub.id/>)
>          ->  Seq Scan on task_child_1 task_parent_1
>          ->  Hash
>                ->  Subquery Scan on sub
>                      ->  Limit
>                            ->  LockRows
>                                  ->  Merge Append
>                                        Sort Key:
> task_parent_3.task_timestamp
>                                        ->  Index Scan using
> task_child_1_task_timestamp_idx on task_child_1 task_parent_4
>                                              Filter: (((NOT reserved)
> AND (task_type = 1)) OR (task_type = 2))
>                                        ->  Index Scan using
> task_child_2_task_timestamp_idx on task_child_2 task_parent_5
>                                              Filter: (((NOT reserved)
> AND (task_type = 1)) OR (task_type = 2))
>    ->  Hash Join
>          Hash Cond: (task_parent_2.id <http://task_parent_2.id/> =
> sub_1.id <http://sub_1.id/>)
>          ->  Seq Scan on task_child_2 task_parent_2
>          ->  Hash
>                ->  Subquery Scan on sub_1
>                      ->  Limit
>                            ->  LockRows
>                                  ->  Merge Append
>                                        Sort Key:
> task_parent_6.task_timestamp
>                                        ->  Index Scan using
> task_child_1_task_timestamp_idx on task_child_1 task_parent_7
>                                              Filter: (((NOT reserved)
> AND (task_type = 1)) OR (task_type = 2))
>                                        ->  Index Scan using
> task_child_2_task_timestamp_idx on task_child_2 task_parent_8
>                                              Filter: (((NOT reserved)
> AND (task_type = 1)) OR (task_type = 2))
> (29 rows)
>
> The subquery is executed twice, and the two executions obviously don't
> return the same results.  I am at a loss for an explanation ...
>
>
> I did notice that the subquery is executed twice and this is the
> difference between the subquery and CTE since the CTE is only executed
> once and then the CTE result is used in the joins against the child
> tables.  Beyond that, I was unable to rationalize what was going on.  One
> thing I don't know is if the subquery should be executed more than once. 
> If yes, then like you said, I would think that they should give the same
> results.
>
> Another interesting thing I noticed is that multiple executions give tasks
> with overlapping time ranges.  Given 2 executions, the task_time ordering
> should give the oldest 50 first and then the next oldest 50 after that. 
> In reality, I see that the second execution has tasks that are older than
> the newest timestamp of the first execution.  Wrapping the update
> statement in a CTE and then picking out the min/max shows this:
>
> with updated as (
>   update task_parent
>   set reserved = true
>   from (
>     select id, task_type
>     from task_parent
>     where reserved = false
>       and (task_type = 1 or task_type = 2)
>     order by task_timestamp
>     limit 50
>     for update skip locked) as sub
>   where sub.id <http://sub.id> = task_parent.id <http://task_parent.id>
>     and sub.task_type = task_parent.task_type
>   returning *
> )
> select min(task_timestamp), max(task_timestamp)
> from updated;
>
> Execution 1:
>             min             |            max
> ----------------------------+----------------------------
>  2020-12-17 11:44:51.192119 | 2020-12-17 11:45:03.881409
>
> Execution 2:
>             min             |            max
> ----------------------------+----------------------------
>  2020-12-17 11:44:59.943108 | 2020-12-17 11:45:14.273185
>
> min of execution 2 is older than max of execution 1 which should not happen.
>
> Craig
>
> On Fri, Dec 18, 2020 at 4:06 AM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at
> <mailto:laurenz(dot)albe(at)cybertec(dot)at>> wrote:
>
> On Thu, 2020-12-17 at 12:21 -0500, Craig McIlwee wrote:
> > Our application uses a queue-like table to assign tasks to users and
> this has worked well for us for a few years.  Now we are in the
> process of adding some restrictions to which tasks a user can
> > work on and that is based on an attribute of each task that does not
> change for the task's lifespan.  Users may have access to work on one
> or more or types of tasks.  To improve query time when
> > finding the set of tasks that we assign, we are introducing
> partitioning into our task queue table.  When assigning tasks, we
> issue an update statement to mark the tasks as reserved using a subquery
> > that orders the tasks by age.  With the introduction of
> partitioning, we are seeing that the update statement affects more
> rows than expected.  An example query is:
> >
> > ---
> > update task_parent
> > set reserved = true
> > from (
> >   select id
> >   from task_parent
> >   where reserved = false
> >     and task_type = 1 or task_type = 2
> >   order by task_timestamp
> >   limit 50
> >   for update skip locked) as sub
> > where sub.id <http://sub.id> = task_parent.id <http://task_parent.id>
> > returning task_parent.id <http://task_parent.id>
> > ---
> >
> > In the statement above, we have a subquery to limit the number of
> tasks to 50 yet the update statement sometimes returns more than 50
> records.  I have narrowed this down to a small, reproducible
> > example shown below.  The first time I run the update statement I
> get ~65 records, then typically ~53 the next few runs, and then it
> starts consistently giving me 50 records after that.  Then if I
> > bump the limit to 100 I will get more than 100 initially and after
> several executions it starts to settle into always giving the expected
> 100.
> >
> > Below is the full setup that can be used to reproduce what I'm
> seeing.  It was initially observed on PostgreSQL 11.8 but I can also
> reproduce it on 13.0.
> >
> > ---
> > create table task_parent (
> >   id bigint not null,
> >   task_type smallint not null,
> >   reserved boolean not null,
> >   task_timestamp timestamp not null
> > ) partition by list (task_type);
> >
> > create table task_child_1
> > partition of task_parent for values in (1);
> >
> > create table task_child_2
> > partition of task_parent for values in (2);
> >
> > insert into task_parent
> > select
> >   generate_series(1, 500000),
> >   case when random() < 0.5 then 1 else 2 end,
> >   false,
> >   now() - (random() * '1 day'::interval);
> >
> > create index task_parent_task_time_idx
> > on task_parent (task_timestamp);
> >
> > update task_parent
> > set reserved = true
> > from (
> >   select id
> >   from task_parent
> >   where reserved = false
> >     and task_type = 1 or task_type = 2
> >   order by task_timestamp
> >   limit 50
> >   for update skip locked) as sub
> > where sub.id <http://sub.id> = task_parent.id <http://task_parent.id>
> > returning task_parent.id <http://task_parent.id>;
> > ---
> >
> > A couple of interesting observations:
> > 1) If I remove the order by clause I always get the expected number
> of results
> > 2) If I rewrite the query to use a CTE for the task IDs instead of a
> subquery then I always get the expected number of results
> >
> > At its surface, this seems like it could be a bug but maybe there is
> something about this usage pattern that is known/expected to cause
> this behavior.  So that's the question - is this a bug that
> > should be reported to pgsql-bugs, or is this expected and if so, why?
>
> Yes, this must be a bug:
>
> EXPLAIN (COSTS OFF) update task_parent
> set reserved = true
> from (
>   select id
>   from task_parent
>   where reserved = false
>     and task_type = 1 or task_type = 2
>   order by task_timestamp
>   limit 50
>   for update skip locked) as sub
> where sub.id <http://sub.id> = task_parent.id <http://task_parent.id>
> returning task_parent.id <http://task_parent.id>;
>
>                                                         QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------
>  Update on task_parent
>    Update on task_child_1 task_parent_1
>    Update on task_child_2 task_parent_2
>    ->  Hash Join
>          Hash Cond: (task_parent_1.id <http://task_parent_1.id> =
> sub.id <http://sub.id>)
>          ->  Seq Scan on task_child_1 task_parent_1
>          ->  Hash
>                ->  Subquery Scan on sub
>                      ->  Limit
>                            ->  LockRows
>                                  ->  Merge Append
>                                        Sort Key:
> task_parent_3.task_timestamp
>                                        ->  Index Scan using
> task_child_1_task_timestamp_idx on task_child_1 task_parent_4
>                                              Filter: (((NOT reserved)
> AND (task_type = 1)) OR (task_type = 2))
>                                        ->  Index Scan using
> task_child_2_task_timestamp_idx on task_child_2 task_parent_5
>                                              Filter: (((NOT reserved)
> AND (task_type = 1)) OR (task_type = 2))
>    ->  Hash Join
>          Hash Cond: (task_parent_2.id <http://task_parent_2.id> =
> sub_1.id <http://sub_1.id>)
>          ->  Seq Scan on task_child_2 task_parent_2
>          ->  Hash
>                ->  Subquery Scan on sub_1
>                      ->  Limit
>                            ->  LockRows
>                                  ->  Merge Append
>                                        Sort Key:
> task_parent_6.task_timestamp
>                                        ->  Index Scan using
> task_child_1_task_timestamp_idx on task_child_1 task_parent_7
>                                              Filter: (((NOT reserved)
> AND (task_type = 1)) OR (task_type = 2))
>                                        ->  Index Scan using
> task_child_2_task_timestamp_idx on task_child_2 task_parent_8
>                                              Filter: (((NOT reserved)
> AND (task_type = 1)) OR (task_type = 2))
> (29 rows)
>
> The subquery is executed twice, and the two executions obviously don't
> return the same results.  I am at a loss for an explanation ...
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>

--
Angular momentum makes the world go 'round.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Joel Jacobson 2020-12-18 15:23:40 Avoid undesired flattening of jsonb arrays?
Previous Message Craig McIlwee 2020-12-18 12:11:56 Re: Unexpected result count from update statement on partitioned table