Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, alex work <alexwork033(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs
Date: 2024-03-26 18:16:03
Message-ID: 2999167.1711476963@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Nathan Bossart <nathandbossart(at)gmail(dot)com> writes:
> I spent some time trying to get some ballpark figures but have thus far
> been unsuccessful. Even if I was able to get good numbers, I'm not sure
> how much they'd help us, as we'll still need to decide how much overhead we
> are willing to take in comparison to the linear search. I don't think
> ~1000 is an unreasonable starting point, as it seems generally more likely
> that you will have many more roles to process at that point than if the
> threshold was, say, 100. And if the threshold is too high (e.g., 10,000),
> this optimization will only kick in for the most extreme cases, so we'd
> likely be leaving a lot on the table. But, I will be the first to admit
> that my reasoning here is pretty unscientific, and I'm open to suggestions
> for how to make it less so.

I did a little experimentation using the attached quick-hack C
function, and came to the conclusion that setting up the bloom filter
costs more or less as much as inserting 1000 or so OIDs the dumb way.
So we definitely want a threshold that's not much less than that.
For example, with ROLES_LIST_BLOOM_THRESHOLD = 100 I saw:

regression=# select drive_bloom(100, 10, 100000);
drive_bloom
-------------

(1 row)

Time: 319.931 ms
regression=# select drive_bloom(101, 10, 100000);
drive_bloom
-------------

(1 row)

Time: 319.385 ms
regression=# select drive_bloom(102, 10, 100000);
drive_bloom
-------------

(1 row)

Time: 9904.786 ms (00:09.905)

That's a pretty big jump in context. With the threshold set to 1024,

regression=# select drive_bloom(1024, 10, 100000);
drive_bloom
-------------

(1 row)

Time: 14597.510 ms (00:14.598)
regression=# select drive_bloom(1025, 10, 100000);
drive_bloom
-------------

(1 row)

Time: 14589.197 ms (00:14.589)
regression=# select drive_bloom(1026, 10, 100000);
drive_bloom
-------------

(1 row)

Time: 25947.000 ms (00:25.947)
regression=# select drive_bloom(1027, 10, 100000);
drive_bloom
-------------

(1 row)

Time: 25399.718 ms (00:25.400)
regression=# select drive_bloom(2048, 10, 100000);
drive_bloom
-------------

(1 row)

Time: 33809.536 ms (00:33.810)

So I'm now content with choosing a threshold of 1000 or 1024 or so.

As for the bloom filter size, I see that bloom_create does

bitset_bytes = Min(bloom_work_mem * UINT64CONST(1024), total_elems * 2);
bitset_bytes = Max(1024 * 1024, bitset_bytes);

which means that any total_elems input less than 512K is disregarded
altogether. So I'm not sold on your "ROLES_LIST_BLOOM_THRESHOLD * 10"
value. Maybe it doesn't matter though.

I do not like, even a little bit, your use of a static variable to
hold the bloom filter pointer. That code will misbehave horribly
if we throw an error partway through the role-accumulation loop;
the next call will try to carry on using the old filter, which would
be wrong even if it still existed which it likely won't. It's not
that much worse notationally to keep it as a local variable, as I
did in the attached.

regards, tom lane

Attachment Content-Type Size
drive_bloom.c text/x-c 2.8 KB

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Nathan Bossart 2024-03-26 18:48:19 Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs
Previous Message Nathan Bossart 2024-03-26 16:59:18 Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-03-26 18:48:19 Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs
Previous Message Tomas Vondra 2024-03-26 18:09:45 Re: pg_upgrade --copy-file-range