Re: DSM segment handle generation in background workers

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: DSM segment handle generation in background workers
Date: 2018-11-14 04:50:26
Message-ID: CAEepm=1COuNcj_va0Qt2KTEE4rDvhWCixZ-pyh8iUbrusEVBzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 14, 2018 at 3:24 PM Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Mon, Nov 12, 2018 at 09:39:01AM -0500, Tom Lane wrote:
> > Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> > > On Mon, Nov 12, 2018 at 9:34 PM Noah Misch <noah(at)leadboat(dot)com> wrote:
> > >> Compared to the old code, the new code requires more wall time to visit every
> > >> possible seed value. New code xor's the PID bits into the fastest-changing
> > >> timestamp bits, so only about twenty bits can vary within any given one-second
> > >> period. (That assumes a PID space of twenty or fewer bits; fifteen bits is
> > >> the Linux default.) Is that aspect of the change justified?
> >
> > > Hmm, right. How about applying pg_bswap32() to one of the terms, as
> > > an easy approximation of reversing the bits?
>
> That seems adequate, yes. But why not just restore the old algorithm in the
> new function? (I'm not opposed to revising the algorithm, but I think a
> revision should have explicit goals. The ease of pg_bswap32() is not itself a
> worthy goal, because the old BackendRun() algorithm was also quite easy.)

Ok. I rewrote it only because in the new coding I had a TimestampTz
rather than the separate sec and usec components of the old code. I
didn't intend to revise the algorithm fundamentally, but I missed the
significance of the bit shifting from commit 98c50656c that moved the
faster moving bits up. I'll change it back.

> > I doubt that's a good idea; to a first approximation, it would mean that
> > half the seed depends only on the PID and the other half only on the
> > timestamp. Maybe we could improve matters a little by left-shifting the
> > PID four bits or so, but I think we still want it to mix with some
> > rapidly-changing time bits.
> >
> > I'm not really sure that we need to do anything though. Basically,
> > what we've got here is a tradeoff between how many bits change over
> > a given timespan and how unpredictable those bits are. I don't see
> > that one of those is necessarily more important than the other.
>
> What counts is the ease of predicting a complete seed. HEAD's algorithm has
> ~13 trivially-predictable bits, and the algorithm that stood in BackendRun()
> from 98c5065 until 197e4af had no such bits. You're right that the other 19
> bits are harder to predict than any given 19 bits under the old algorithm, but
> the complete seed remains more predictable than it was before 197e4af.

However we mix them, given that the source code is well known, isn't
an attacker's job really to predict the time and pid, two not
especially well guarded secrets? I don't see how the mixing matters
in that respect. (You can also just clobber the seed from SQL, but
that'd be cheating.)

> Goal I recommend: if connections arrive in a Poisson distribution of unknown
> lambda, maximize the number of distinct seeds.

Yeah. I think bit reversing one of them achieves the maximum number
of distinct seeds by putting the busy ends far apart, but the original
bit shifting is similar.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2018-11-14 05:03:11 Re: Race condition in WaitForBackgroundWorkerStartup
Previous Message Kuntal Ghosh 2018-11-14 04:45:43 In-place updates and serializable transactions