Re: Interrupts vs signals

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Interrupts vs signals
Date: 2024-10-30 16:03:57
Message-ID: b3930251-78d5-4379-b213-b37b469f6dfb@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

One remaining issue with these patches is backwards-compatibility for
extensions:

1. Any extensions using WaitLatch, SetLatch, ResetLatch need to be
changed to use WaitInterrupt, SetInterrupt/RaiseInterrupt,
ResetInterrupt instead.

2. If an extension is defining its own Latch with InitLatch, rather than
using MyLatch, it's out of luck. You could probably rewrite it using the
INTERRUPT_GENERAL_WAKEUP, which is multiplexed like MyLatch, but it's a
bit more effort.

We can provide backwards compatibility macros and a new facility to
allocate custom interrupt bits. But how big of a problem is this anyway?
In an in-person chat last week, Andres said something like "this will
break every extension". I don't think it's quite that bad, and more
complicated extensions need small tweaks in every major release anyway.
But let's try to quantify that.

Analysis
--------

Joel compiled a list of all extensions that are on github:
https://gist.github.com/joelonsql/e5aa27f8cc9bd22b8999b7de8aee9d47. I
did "git checkout" on all of them, 1159 repositories in total (some of
the links on the list were broken, or required authentication).

86 out of those 1159 extensions contain the word "Latch":

$ ls -d */* | xargs -IFOO bash -c "grep -q -r -I Latch FOO && echo
FOO" | wc -l
86

For comparison, in PostgreSQL v10, we added a new 'wait_event_info'
argument to WaitLatch and WaitLatchForSocket. That breakage was on
similar scale:

$ ls -d */* | xargs -IFOO bash -c "grep -q -r -I WaitLatch FOO &&
echo FOO" | wc -l
71

Admittedly that was a long time ago, we might have different priorities now.

Deeper analysis
---------------

Most of those calls are simple calls to SetLatch(MyLatch),
WaitLatch(MyLatch, ...) etc. that could be handled by backwards
compatibility macros. Many of the calls don't use MyLatch, but use
MyProc->procLatch directly:

rc = WaitLatch(&MyProc->procLatch, ...)

That makes sense, since MyLatch was added in 2015, while the initial
latch code was added in 2010. Any extensions that lived during that era
would use MyProc->procLatch for backwards-compatibility, and there's
been no need to change that since. That could also be supported by
backwards-compatibility macros, or we could tell the extension authors
to search & replace MyProc->procLatch to MyLatch.

Excluding those, there are a handful of extensions that would need to be
updated. Below is a quick analysis of the remaining results of "grep
Latch" in all the repos:

These calls use the process latch to wake up a different process (I
excluded some duplicated forks of the same extension):

2ndQuadrant/pglogical/pglogical_sync.c:
SetLatch(&apply->proc->procLatch);
2ndQuadrant/pglogical/pglogical_apply.c:
SetLatch(&worker->proc->procLatch);
2ndQuadrant/pglogical/pglogical_worker.c:
SetLatch(&w->proc->procLatch);
2ndQuadrant/pglogical/pglogical_worker.c:
SetLatch(&PGLogicalCtx->supervisor->procLatch);
heterodb/pg-strom/src/gpu_cache.c: Latch *backend;
heterodb/pg-strom/src/gpu_cache.c: cmd->backend = (is_async ? NULL
: MyLatch);
heterodb/pg-strom/src/gpu_cache.c:
SetLatch(cmd->backend);
heterodb/pg-strom/src/gpu_cache.c:
SetLatch(cmd->backend);
citusdata/citus/src/backend/distributed/utils/maintenanced.c: Latch
*latch; /* pointer to the background worker's latch */
citusdata/citus/src/backend/distributed/utils/maintenanced.c:
SetLatch(dbData->latch);
liaorc/devel-master/src/cuda_program.c:
SetLatch(&proc->procLatch);
okbob/generic-scheduler/dbworker.c:
SetLatch(&dbworker->parent->procLatch);
postgrespro/lsm3/lsm3.c:
SetLatch(&entry->merger->procLatch);
postgrespro/lsm3/lsm3.c:
SetLatch(&entry->merger->procLatch);
postgrespro/pg_pageprep/pg_pageprep.c:
SetLatch(&starter->procLatch);
postgrespro/pg_pageprep/pg_pageprep.c:
SetLatch(&starter->procLatch);
postgrespro/pg_query_state/pg_query_state.c: Latch *caller;
postgrespro/pg_query_state/pg_query_state.c:
SetLatch(counterpart_userid->caller);
postgrespro/pg_query_state/pg_query_state.c:
counterpart_userid->caller = MyLatch;
timescale/timescaledb/src/loader/bgw_message_queue.c:
SetLatch(&BackendPidGetProc(queue_get_reader(queue))->procLatch);
troels/hbase_fdw/process_communication.c: control->latch = MyLatch;
troels/hbase_fdw/process_communication.c:
SetLatch(control->latch);

The above could easily be fairly replaced with the new SendInterrupt()
function. A backwards compatibility macro might be possible for some of
these, but would be tricky for others. So I think these extensions will
need to adapt by switching to SendInterrupt(). At quick glance, it would
easy for all of these to do that with a small "#if PG_VERSION_NUM" block.

postgrespro/pg_wait_sampling/pg_wait_sampling.c: * null wait
events. So instead we make use of DisownLatch() resetting
postgrespro/pg_wait_sampling/pg_wait_sampling.c: if (proc->pid ==
0 || proc->procLatch.owner_pid == 0 || proc->pid == MyProcPid)
postgrespro/pg_wait_sampling/pg_wait_sampling.c:
SetLatch(pgws_collector_hdr->latch);
postgrespro/pg_wait_sampling/pg_wait_sampling.c:
SetLatch(pgws_collector_hdr->latch);
postgrespro/pg_wait_sampling/pg_wait_sampling.h: Latch
*latch;

This case is similar to the previous group: a latch is used to wake up
another process. It peeks directly into procLatch.owner_pid however. I
suspect this will be simpler and better after rewriting to use
interrupts, but not sure.

darold/datalink/datalink_bgw.c: (errmsg("Latch
status before waitlatch call: %d", MyProc->procLatch.is_set)));

This is a DEBUG1 message. Could probably be just removed, it doesn't
seem very useful.

postgrespro/jsonbd/jsonbd.c: SetLatch(&wd->latch);
postgrespro/jsonbd/jsonbd.h: Latch latch;
postgrespro/jsonbd/jsonbd.h: Latch
launcher_latch;
postgrespro/jsonbd/jsonbd_worker.c: SetLatch(&hdr->launcher_latch);
postgrespro/jsonbd/jsonbd_worker.c: InitLatch(&worker_state->latch);
postgrespro/jsonbd/jsonbd_worker.c: InitLatch(&hdr->launcher_latch);
postgrespro/jsonbd/jsonbd_worker.c: rc = WaitLatch(MyLatch,
WL_LATCH_SET | WL_POSTMASTER_DEATH,
postgrespro/jsonbd/jsonbd_worker.c: ResetLatch(MyLatch);
postgrespro/jsonbd/jsonbd_worker.c: rc =
WaitLatch(&worker_state->latch, WL_LATCH_SET | WL_POSTMASTER_DEATH,
postgrespro/jsonbd/jsonbd_worker.c:
ResetLatch(&worker_state->latch);
postgrespro/jsonbd/jsonbd_worker.c: WaitLatch(&hdr->launcher_latch,
WL_LATCH_SET, 0, PG_WAIT_EXTENSION);
postgrespro/jsonbd/jsonbd_worker.c: WaitLatch(&hdr->launcher_latch,
WL_LATCH_SET, 0);
postgrespro/jsonbd/jsonbd_worker.c: ResetLatch(&hdr->launcher_latch);
timescale/timescaledb/test/src/bgw/params.h: Latch timer_latch;
timescale/timescaledb/test/src/bgw/params.c:
SetLatch(&wrapper->params.timer_latch);
timescale/timescaledb/test/src/bgw/params.c:
InitLatch(&wrapper->params.timer_latch);
timescale/timescaledb/test/src/bgw/params.c:
ResetLatch(&wrapper->params.timer_latch);
timescale/timescaledb/test/src/bgw/params.c:
WaitLatch(&wrapper->params.timer_latch,

These two extensions, 'jsonbd' and 'timescaledb', actually use InitLatch
to define their own latches.

I think the 'jsonbd' usage was copied from logical apply workers. At
quick glance, it probably should be using the process latch instead of
creating its own.

In 'timescaledb', the InitLatch call is in a mock test module. It
probably could be written differently..

Summary
-------

We have a few options for how to deal with backwards-compatibility for
extensions:

A) If we rip off the bandaid in one go and don't provide any
backwards-compatibility macros, we will break 96 extensions. Most of
them can be fixed by replacing WaitLatch, SetLatch, ResetLatch with
corresponding WaitInterrupt, RaiseInterrupt, ResetInterrupt calls. (With
#ifdef for compatibility with older postgres versions)

B) If we provide backwards-compatibility macros so that simple Latch
calls on MyLatch continue working, we will break about 14 extensions.
They will need some tweaking like in option A). A bit more than the
simple cases in option A), but nothing too difficult.

C) We could provide "forward-compatibility" macros in a separate header
file, to make the new "SetInterrupt" etc calls work in old PostgreSQL
versions. Many of the extensions already have a header file like this,
see e.g. citusdata/citus/src/include/pg_version_compat.h,
pipelinedb/pipelinedb/include/compat.h. It might actually be a good idea
to provide a semi-official header file like this on the Postgres wiki,
to help extension authors. It would encourage extensions to use the
latest idioms, while still being able to compile for older versions.

I'm leaning towards option C). Let's rip off the band-aid, but provide
documentation for how to adapt your extension code. And provide a
forwards-compatibility header on the wiki, that extension authors can
use to make the new Interrupt calls work against old server versions.

The second question is whether we need to provide a replacement for
InitLatch for extensions. ISTM that none of the extensions out there
truly need it. But maybe if we had a nicer interface for allocating
interrupt bits for custom usage, they would actually find that handy.
But I'm leaning towards "no" at this point. Could be added later.

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2024-10-30 16:16:06 Re: Pgoutput not capturing the generated columns
Previous Message Robert Haas 2024-10-30 15:30:57 Re: Eager aggregation, take 3