| From: | Andres Freund <andres(at)anarazel(dot)de> | 
|---|---|
| To: | pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter_e(at)gmx(dot)net>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> | 
| Subject: | logical replication worker accesses catalogs in error context callback | 
| Date: | 2021-01-06 02:02:29 | 
| Message-ID: | 20210106020229.ne5xnuu6wlondjpe@alap3.anarazel.de | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi,
Due to a debug ereport I just noticed that worker.c's
slot_store_error_callback is doing something quite dangerous:
static void
slot_store_error_callback(void *arg)
{
	SlotErrCallbackArg *errarg = (SlotErrCallbackArg *) arg;
	LogicalRepRelMapEntry *rel;
	char	   *remotetypname;
	Oid			remotetypoid,
				localtypoid;
	/* Nothing to do if remote attribute number is not set */
	if (errarg->remote_attnum < 0)
		return;
	rel = errarg->rel;
	remotetypoid = rel->remoterel.atttyps[errarg->remote_attnum];
	/* Fetch remote type name from the LogicalRepTypMap cache */
	remotetypname = logicalrep_typmap_gettypname(remotetypoid);
	/* Fetch local type OID from the local sys cache */
	localtypoid = get_atttype(rel->localreloid, errarg->local_attnum + 1);
	errcontext("processing remote data for replication target relation \"%s.%s\" column \"%s\", "
			   "remote type %s, local type %s",
			   rel->remoterel.nspname, rel->remoterel.relname,
			   rel->remoterel.attnames[errarg->remote_attnum],
			   remotetypname,
			   format_type_be(localtypoid));
}
that's not code that can run in an error context callback. It's
theoretically possible (but unlikely) that
logicalrep_typmap_gettypname() is safe to run in an error context
callback. But get_atttype() definitely isn't.
get_attype() may do catalog accesses. That definitely can't happen
inside an error context callback - the entire transaction might be
borked at this point!
I think this was basically broken from the start in
commit 665d1fad99e7b11678b0d5fa24d2898424243cd6
Author: Peter Eisentraut <peter_e(at)gmx(dot)net>
Date:   2017-01-19 12:00:00 -0500
Logical replication
but then got a lot worse in
commit 24c0a6c649768f428929e76dd7f5012ec9b93ce1
Author: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Date:   2018-03-14 21:34:26 -0300
logical replication: fix OID type mapping mechanism
Greetings,
Andres Freund
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Masahiko Sawada | 2021-01-06 02:13:26 | Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion | 
| Previous Message | Michael Paquier | 2021-01-06 01:52:53 | Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help) |