Re: Let's make PostgreSQL multi-threaded

From: Andres Freund <andres(at)anarazel(dot)de>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Let's make PostgreSQL multi-threaded
Date: 2023-06-07 22:09:19
Message-ID: 20230607220919.pilzcbqnp2rwfbl4@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-06-06 16:14:41 -0400, Greg Stark wrote:
> I think of processes and threads as fundamentally the same things,
> just a slightly different API -- namely that in one memory is by
> default unshared and needs to be explicitly shared and in the other
> it's default shared and needs to be explicitly unshared.

In theory that's true, in practice it's entirely wrong.

For one, the amount of complexity you need to deal with to share state across
processes, post fork, is *substantial*. You can share file descriptors across
processes, but it's extremely platform dependant, requires cooperation between
both processes etc. You can share memory allocations made after the processes
forked, but you're typically not going to be able to guarantee they're at the
same pointer values. Etc.

But more importantly, there's crucial performance differences between threads
and processes. Having the same memory mapping between threads makes allows the
hardware to share the TLB (on x86 via process context identifiers), which
isn't realistically possible with different processes.

> However all else is not equal. The discussion in the hallway turned to
> whether we could just use pthread primitives like mutexes and
> condition variables instead of our own locks -- and the point was
> raised that those libraries assume these objects will be in threads of
> one process not shared across completely different processes.

Independent of threads vs processes, I am -many on using pthread mutexes and
condition variables. From experiments, that *looses* performance, and we loose
a lot of control and increase cross-platform behavioural differences. I also
don't see any benefit in going in that direction.

> And that's probably not the only library we're stuck reimplementing
> because of this. So the question is are these things worth taking the
> risk of having data structures shared implicitly and having unclear
> ownership rules?
>
> I was going to say supporting both modes relieves that fear since it
> would force that extra discipline and allow testing under the more
> restrictive rule. However I don't think that will actually work. As
> long as we support both modes we lose all the advantages of threads.

I don't think that has to be true. We could e.g. eventually decide that we
don't support parallel query without threading support - which would allow us
to get rid of a very significant amount of code and runtime overhead.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-06-07 22:11:24 Re: Order changes in PG16 since ICU introduction
Previous Message Noah Misch 2023-06-07 22:04:16 Re: v16 fails to build w/ Visual Studio 2015