Re: Cleaning up threading code

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cleaning up threading code
Date: 2024-08-26 09:59:10
Message-ID: 1ca9b09d-5a01-4025-951d-b687d7865666@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21/08/2024 15:51, Thomas Munro wrote:
> On Sun, Apr 14, 2024 at 3:16 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>> So I'll go ahead and add the storage class to the next version, and
>> contemplate a couple of different options for the tss stuff, including
>> perhaps leaving it out if that seems doable.
>
> Here is a new attempt at pg_threads.h. Still work in progress, but
> passing tests, with storage class and working TSS, showing various
> users.

Looks good, thanks!

> I eventually understood first that my TSS destructor problems on
> Windows came from mismatched calling conventions, and that you can't
> really trampoline your way around that, at least not without doing
> some pretty unreasonable things, and that is why nobody can emulate
> either tss_create() or pthread_key_create() directly with Windows'
> FlsAlloc(), so everybody who tries finishes up building their own
> infrastructure to track destructors, or in ECPG's case just leaks all
> the memory instead.
>
> Here's the simplest implementation I could come up with so far. I
> don't have Windows so I made it possible to use emulated TSS
> destructors on my local machine with a special macro for testing, and
> then argued with CI for a while until the other machines agreed.
> Otherwise, it's all a fairly thin wrapper and hopefully not suprising.

In the Windows implementation of pg_tss_create(), how do threads and
FlsAlloc() interact? If I understand correctly, one thread can run
multiple "fibers" in a co-operative fashion. It's a legacy feature, not
widely used nowadays, but what will happen if someone does try to use
fibers? I think that will result in chaos. You're using FlsAlloc() to
install the destructor callback but TlsAlloc() for allocating the actual
thread-local values. So all fibers running on the same thread will share
the storage, but the destructor will be called whenever any of the
fibers exit, clearing the TSS storage for all fibers.

I don't know much about Windows or fibers but mixing the Tls* and Fls*
functions seems unwise.

To be honest, I think we should just accept the limitation that TSS
destructors don't run on Windows. Yes, it means we'll continue to leak
memory on ecpg, but that's not a new issue. We can address that as a
separate patch later, if desired. Or more likely, do nothing until C11
threads with proper destructors become widely available on Windows.

> One thing this would need to be complete, at least the way I've
> implemented it, is memory barriers, for non-TSO hardware, which would
> require lifting the ban on atomics.h in frontend code, or at least
> parts of it.

+1 on doing that. Although it becomes moot if you just give up on the
destructors on Windows.

> Only 64 bit emulation is actually tied to the backend
> now (because it calls spinlock stuff, that itself is backend-only, but
> also it doesn't actually need to be either). Or maybe I can figure
> out a different scheme that doesn't need that. Or something...

You could use a pg_mtx now instead of a spinlock. I wonder if there are
any supported platforms left that don't have 64-bit atomics though.

> + * We have some extensions of our own, not present in C11:
> + *
> + * - pg_rwlock_t for read/write locks
> + * - pg_mtx_t static initializer PG_MTX_STATIC_INIT
> + * - pg_barrier_t

Hmm, I don't see anything wrong with those extensions as such, but I
wonder how badly we really need them?

pg_rwlock is used by:
- Windows implementation of pg_mtx_t, to provide PG_MTX_STATIC_INIT
- the custom Thread-Specific Storage implementation (i.e. Windows)

PG_MTX_STATIC_INIT is used by:
- ecpg
- libpq

pg_barrier_t is used by:
- pgbench

pg_rwlock goes away if you give up on the TSS destructors on Windows,
and implement pg_mtx directly with SRWLOCK on Windows.

The barrier implementation could easily be moved into pgbench, if we
don't expect to need it in other places soon. Having a generic
implementation seems fine too though, it's pretty straightforward.

PG_MTX_STATIC_INIT seems hard to get rid of. I suppose you could use
call_once() to ensure it's initialized only once, but a static
initializer is a lot nicer to use.

> diff --git a/src/interfaces/ecpg/ecpglib/ecpglib_extern.h b/src/interfaces/ecpg/ecpglib/ecpglib_extern.h
> index 01b4309a71..d8416b19e3 100644
> --- a/src/interfaces/ecpg/ecpglib/ecpglib_extern.h
> +++ b/src/interfaces/ecpg/ecpglib/ecpglib_extern.h
> @@ -169,7 +169,7 @@ bool ecpg_get_data(const PGresult *, int, int, int, enum ECPGttype type,
> enum ECPGttype, char *, char *, long, long, long,
> enum ARRAY_TYPE, enum COMPAT_MODE, bool);
>
> -void ecpg_pthreads_init(void);
> +#define ecpg_pthreads_init()
> struct connection *ecpg_get_connection(const char *connection_name);
> char *ecpg_alloc(long size, int lineno);
> char *ecpg_auto_alloc(long size, int lineno);

Is it safe to remove the function, or might it be referenced by existing
binaries that were built against an older ecpglib version? I believe
it's OK to remove, it's not part of the public API and should not be
called directly from a binary. But in that case, we don't need the dummy
#define either.

> +/*-------------------------------------------------------------------------
> + *
> + * Barriers. Not in C11. Apple currently lacks the POSIX version.
> + * We assume that the OS might know a better way to implement it that
> + * we do, so we only provide our own if we have to.
> + *
> + *-------------------------------------------------------------------------
> + */
> +
> +#ifdef PG_THREADS_WIN32
> +typedef SYNCHRONIZATION_BARRIER pg_barrier_t;
> +#elif defined(HAVE_PTHREAD_BARRIER)
> +typedef pthread_barrier_t pg_barrier_t;
> +#else
> +typedef struct pg_barrier_t
> +{
> + bool sense;
> + int expected;
> + int arrived;
> + pg_mtx_t mutex;
> + pg_cnd_t cond;
> +} pg_barrier_t;
> +#endif

I'd suggest calling this pg_thrd_barrier_t or even pg_thread_barrier_t.
It's a little more verbose, but would rhyme with pthread_barrier_t. And
to avoid confusing it with memory barriers and ProcSignal barriers.

Comments on the TSS implementation follow, which largely become moot if
you give up on the destructor support on Windows:

> +void
> +pg_tss_dtor_delete(pg_tss_t tss_id)

Should this be pg_tss_delete(), since the C11 function is tss_delete()?
Or is this different? There are actually no callers currently, so maybe
just leave it out. pg_tss_dtor_delete() also seems racy, if run
concurrently with pg_tss_run_destructors().

> + * Make sure our destructor hook is registered with the operating system
> + * in this process. This happens only once in the whole process. Making
> + * sure it will run actually in each thread happens in
> + * pg_tss_ensure_destructors_will_run().

the function is called pg_tss_ensure_destructors_in_this_thread(), not
pg_tss_ensure_destructors_will_run().

> +/*
> + * Called every time pg_tss_set() installs a non-NULL value.
> + */
> +void
> +pg_tss_ensure_destructors_in_this_thread(void)
> +{
> + /*
> + * Pairs with pg_tss_install_run_destructors(), called by pg_tss_create().
> + * This makes sure that we know if the tss_id being set could possibly
> + * have a destructor. We don't want to pay the cost of checking, but we
> + * can check with a simple load if *any* tss_id has a destructor. If so,
> + * we make sure that pg_tss_destructor_hook has a non-NULL value in *this*
> + * thread, because both Windows and POSIX will only call a destructor for
> + * a non-NULL value.
> + */
> + pg_read_barrier();
> + if (pg_tss_run_destructors_installed)
> + {
> +#ifdef PG_THREADS_WIN32
> + if (FlsGetValue(pg_tss_destructor_hook) == NULL)
> + FlsSetValue(pg_tss_destructor_hook, (void *) 1);
> +#else
> + if (pthread_getspecific(pg_tss_destructor_hook) == NULL)
> + pthread_setspecific(pg_tss_destructor_hook, (void *) 1);
> +#endif
> + }
> +}
> +#endif

I believe this cannot get called if !pg_tss_run_destructors_installed,
no need to check that. Because pg_tss_create() will fail
pg_tss_run_destructors is set. Maybe turn it into an Assert. Or
alternatively, skip the call to pg_tss_install_run_destructors when
pg_tss_create() is called with destructor==NULL. I think that may have
been the idea here, based on the comment above.

The write barriers in pg_tss_install_run_destructors() seem excessive.
The function will only ever run in one thread, protected by
pg_call_once(). pg_call_once() surely provides all the necessary barriers.

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2024-08-26 10:30:28 Re: [PATCH] Add CANONICAL option to xmlserialize
Previous Message Amit Kapila 2024-08-26 09:52:01 Re: Conflict detection and logging in logical replication