From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> |
Subject: | Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE |
Date: | 2023-01-31 14:05:17 |
Message-ID: | CAEze2Wh_0XJGQ2o+2OBx-G9rJdV+ZqWv+8GqLzg-Hahc3_gvkQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, 30 Jan 2023 at 21:19, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2023-01-10 21:32:54 +0100, Matthias van de Meent wrote:
> > On Tue, 10 Jan 2023 at 20:14, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > On 2023-01-10 15:03:42 +0100, Matthias van de Meent wrote:
> > > What precisely do you mean with "skew" here? Do you just mean that it'd take a
> > > long time until vacuum_defer_cleanup_age takes effect? Somehow it sounds like
> > > you might mean more than that?
> >
> > h->oldest_considered_running can be extremely old due to the global
> > nature of the value and the potential existence of a snapshot in
> > another database that started in parallel to a very old running
> > transaction.
>
> Here's a version that, I think, does not have that issue.
Thanks!
> In an earlier, not posted, version I had an vacuum_defer_cleanup_age specific
> helper function for this, but it seems likely we'll need it in other places
> too. So I named it TransactionIdRetreatSafely(). I made it accept the xid by
> pointer, as the line lengths / repetition otherwise end up making it hard to
> read the code. For now I have TransactionIdRetreatSafely() be private to
> procarray.c, but I expect we'll have to change that eventually.
If TransactionIdRetreatSafely will be exposed outside procarray.c,
then I think the xid pointer should be replaced with normal
arguments/returns; both for parity with TransactionIdRetreatedBy and
to remove this memory store dependency in this hot code path.
> Not sure I like TransactionIdRetreatSafely() as a name. Maybe
> TransactionIdRetreatClamped() is better?
I think the 'safely' version is fine.
> I've been working on a test for vacuum_defer_cleanup_age. It does catch the
> corruption at hand, but not much more. It's quite painful to write, right
> now. Some of the reasons:
> https://postgr.es/m/20230130194350.zj5v467x4jgqt3d6%40awork3.anarazel.de
>
>
>
> > > I'm tempted to go with reinterpreting 64bit xids as signed. Except that it
> > > seems like a mighty invasive change to backpatch.
> >
> > I'm not sure either. Protecting against underflow by halving the
> > effective valid value space is quite the intervention, but if it is
> > necessary to make this work in a performant manner, it would be worth
> > it. Maybe someone else with more experience can provide their opinion
> > here.
>
> The attached assertions just removes 1/2**32'ths of the space, by reserving
> the xid range with the upper 32bit set as something that shouldn't be
> reachable.
I think that is acceptible.
> Still requires us to change the input routines to reject that range, but I
> think that's a worthy tradeoff.
Agreed.
> I didn't find the existing limits for the
> type to be documented anywhere.
>
> Obviously something like that could only go into HEAD.
Yeah.
Comments on 0003:
> + /*
> + * FIXME, doubtful this is the best fix.
> + *
> + * Can't represent the 32bit xid as a 64bit xid, as it's before fxid
> + * 0. Represent it as an xid from the future instead.
> + */
> + if (epoch == 0)
> + return FullTransactionIdFromEpochAndXid(0, xid);
Shouldn't this be an error condition instead, as this XID should not
be able to appear?
on 0004:
> - '0xffffffffffffffff'::xid8,
> - '-1'::xid8;
> + '0xefffffffffffffff'::xid8,
> + '0'::xid8;
The 0xFF... usages were replaced with "0xEFFF...". Shouldn't we also
test on 0xffff_fffE_ffff_ffff to test for input of our actual max
value?
> @@ -326,7 +329,11 @@ parse_snapshot(const char *str, Node *escontext)
> while (*str != '\0')
> {
> /* read next value */
> - val = FullTransactionIdFromU64(strtou64(str, &endp, 10));
> + raw_fxid = strtou64(str, &endp, 10);
> +
> + val = FullTransactionIdFromU64(raw_fxid);
> + if (!InFullTransactionIdRange(raw_fxid))
> + goto bad_format;
With assertions enabled FullTransactionIdFromU64 will assert the
InFullTransactionIdRange condition, meaning we wouldn't hit the branch
into bad_format.
I think these operations should be swapped, as parsing a snapshot
shouldn't run into assertion failures like this if it can error
normally. Maybe this can be added to tests as well?
Kind regards,
Matthias van de Meent
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2023-01-31 14:07:20 | Re: Speed up transaction completion faster after many relations are accessed in a transaction |
Previous Message | Melanie Plageman | 2023-01-31 14:02:24 | Re: heapgettup() with NoMovementScanDirection unused in core? |