From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: recent failures on lorikeet |
Date: | 2021-06-14 17:18:43 |
Message-ID: | 241120.1623691123@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> The line in lmgr.c is where the process title gets changed to "waiting".
> I recently stopped setting process title on this animal on REL_13_STABLE
> and its similar errors have largely gone away.
Oooh, that certainly seems like a smoking gun.
> I can do the same on
> HEAD. But it does make me wonder what the heck has changed to make this
> code fragile.
So what we've got there is
old_status = get_ps_display(&len);
new_status = (char *) palloc(len + 8 + 1);
memcpy(new_status, old_status, len);
strcpy(new_status + len, " waiting");
set_ps_display(new_status);
new_status[len] = '\0'; /* truncate off " waiting" */
Line 1831 is the strcpy, but it seems entirely impossible that that
could fail, unless palloc has shirked its job. I'm thinking that
the crash is really in the memcpy --- looking at the other lines
in your trace, fingering the line after the call seems common.
What that'd have to imply is that get_ps_display() messed up,
returning a bad pointer or a bad length.
A platform-specific problem in get_ps_display() seems plausible
enough. The apparent connection to a concurrent VACUUM FULL seems
pretty hard to explain that way ... but maybe that's a mirage.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2021-06-14 17:29:41 | Re: recent failures on lorikeet |
Previous Message | Robert Haas | 2021-06-14 17:13:54 | Re: Question about StartLogicalReplication() error path |