Re: SQL:2011 application time

From: Paul Jungwirth <pj(at)illuminatedcomputing(dot)com>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, jian he <jian(dot)universality(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: SQL:2011 application time
Date: 2025-02-05 18:31:05
Message-ID: adbb0b45-2ff1-41a8-a119-b332f9d4bdc4@illuminatedcomputing.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/5/25 05:37, Peter Eisentraut wrote:
> On 29.01.25 07:34, Paul Jungwirth wrote:
>> Is it possible to commit an RI_PLAN_NO_ACTION addition and see if that makes the buildfarm
>> failures go away? Here is a proposed patch for that (v48.1). I would understand if this is too
>> questionable a practice---but it would be nice to get sufficient test exposure to see if it makes
>> a difference. Since I still haven't reproduced this locally (despite running continuously for
>> almost a week), it's not an experiment I can do myself. If it *does* make the failures go away,
>> then it suggests there is still some latent problem somewhere.
>
> I'm tempted to give this a try.  But the cfbot is currently in a bit of a mess, so I'll wait until
> that is clean again so that we can have a usable baseline to work against.

Okay, thanks! I've been spending some more time on this, but I haven't made much progress.

It's surely not as simple as just oid wrapround. Here is a bpftrace script to show when we change
TransamVariables->nextOid:

BEGIN {
@setnext = 0
}

u:/home/paul/local/bin/postgres:GetNewObjectId {
@newoids[tid] += 1
}

u:/home/paul/local/bin/postgres:SetNextObjectId {
@setnext += 1
}

When I run this during `make installcheck` I get only 29608 total calls to GetNewObjectId, and none
for SetNextObjectId.

I've also been looking at the dynahash code a bit. With gdb I can give two constraint oids a hash
collision, but of course that isn't sufficient, since we memcmp the whole key as well.

Last night I started looking at ri_constraint_cache, which is maybe a little more interesting due to
the syscache invalidation code. A parallel test could cause an invalidation between lines of the
without_overlaps test. Getting the wrong riinfo could make us treat a RESTRICT constraint as NO
ACTION. But I don't see any way for that to happen yet.

I have too much confidence in the Postgres codebase to really expect to find bugs in any of these
places. And yet I don't see how 1772d554b0 could make a RESTRICT test fail, since all its changes
are wrapped in `if (is_no_action)`---except if the RESTRICT constraint is somehow executing the NO
ACTION query by mistake.

Anyway I'll keep at it!

Yours,

--
Paul ~{:-)
pj(at)illuminatedcomputing(dot)com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-02-05 18:32:02 Re: Show WAL write and fsync stats in pg_stat_io
Previous Message Masahiko Sawada 2025-02-05 18:30:08 Re: Make COPY format extendable: Extract COPY TO format implementations