From: | Lars Vonk <lars(dot)vonk(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #15715: UPDATE using "in (subquery for update limit 1)" does not respect the limit in subquery |
Date: | 2019-03-27 07:22:39 |
Message-ID: | CAMX1Thju8rO1cqbZPq=qj7+uENmssPw4SEBC2tzQxcUQ8AEDag@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
> And have you actually seen any such failures?
No not yet. And probably won't, given your explanation.
So what I understand so far is that allthough the query is not garantueed
to return the same single row, it is not possible it returned and updated
multiple rows in a single execution?
> Without seeing an
> EXPLAIN for this query on your system, we can't know whether it's being
> done like that
I ran the query again with EXPLAIN ANALYZE (but I am unsure if this means
anything since we do not have the exact EXPLAIN of that particular query):
Update on delayed_jobs (cost=146.81..154.85 rows=1 width=758) (actual
time=1.035..1.035 rows=0 loops=1)
-> Nested Loop (cost=146.81..154.85 rows=1 width=758) (actual
time=1.035..1.035 rows=0 loops=1)
-> HashAggregate (cost=146.54..146.55 rows=1 width=32) (actual
time=1.035..1.035 rows=0 loops=1)
Group Key: "ANY_subquery".id
-> Subquery Scan on "ANY_subquery" (cost=146.51..146.53
rows=1 width=32) (actual time=1.034..1.034 rows=0 loops=1)
-> Limit (cost=146.51..146.52 rows=1 width=22)
(actual time=1.033..1.034 rows=0 loops=1)
-> LockRows (cost=146.51..146.52 rows=1
width=22) (actual time=1.033..1.033 rows=0 loops=1)
-> Sort (cost=146.51..146.51 rows=1
width=22) (actual time=1.033..1.033 rows=0 loops=1)
Sort Key: delayed_jobs_1.priority,
delayed_jobs_1.run_at
Sort Method: quicksort Memory: 25kB
-> Seq Scan on delayed_jobs
delayed_jobs_1 (cost=0.00..146.50 rows=1 width=22) (actual
time=1.010..1.010 rows=0 loops=1)
Filter: ((failed_at IS NULL)
AND ((queue)::text = 'workflows'::text) AND (((run_at <= '2019-03-26
13:25:22.208747'::timestamp without time zone) AND ((locked_at IS NULL) OR
(locked_at < '2019-03-26 09:25:22.20877'::timestamp without time zone))) OR
((locked_by)::text = 'host:job01.prod.jortt.nl pid:10029'::text)))
Rows Removed by Filter: 160
-> Index Scan using delayed_jobs_pkey on delayed_jobs
(cost=0.28..8.29 rows=1 width=206) (never executed)
Index Cond: (id = "ANY_subquery".id)
Planning Time: 0.367 ms
Execution Time: 1.076 ms
(17 rows)
> Either way, a WITH is probably preferable because that does get you
> into a place where we guarantee single-evaluation semantics.
Will do.
Thanks again for your time and patience to explain.
Lars
On Tue, Mar 26, 2019 at 11:53 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Lars Vonk <lars(dot)vonk(at)gmail(dot)com> writes:
> > Thanks for the explanation. Our hotfix was to replace the IN with a = so
> > the query fails and the transaction is rolled back when more than one is
> > returned from the sub-SELECT.
>
> And have you actually seen any such failures? If you have, then there
> is certainly something going wrong here. The LIMIT should never allow
> more than one row to be returned by the sub-SELECT, period, full stop.
>
> The issue here is that the sub-SELECT is, in principle, executed afresh
> for every row of the outer query, and you do not have a guarantee that
> each such execution returns the *same* single row. Ordinary query
> execution would provide such a guarantee, but you're using FOR UPDATE
> on a table that's being modified concurrently (including by this query
> itself), and therefore the guarantee disappears.
>
> You might think that since the sub-SELECT is uncorrelated with the outer
> query, there's no need to execute it more than once ... but that's an
> optimization, not part of the guaranteed semantics. Without seeing an
> EXPLAIN for this query on your system, we can't know whether it's being
> done like that (though the fact that you're complaining suggests that
> it isn't).
>
> Interestingly, it's quite likely that your "hotfix" made the problem
> go away, because PG generally *does* optimize uncorrelated sub-SELECTs
> to be executed only once if they return scalar results. That doesn't
> happen for IN subqueries though, as those are treated as joins.
>
> Either way, a WITH is probably preferable because that does get you
> into a place where we guarantee single-evaluation semantics.
>
> regards, tom lane
>
From | Date | Subject | |
---|---|---|---|
Next Message | Dean Rasheed | 2019-03-27 12:46:29 | Re: BUG #15708: RLS 'using' running as wrong user when called from a view |
Previous Message | Michael Paquier | 2019-03-27 02:56:20 | Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping a partition table |