Re: Patch: fix lock contention for HASHHDR.mutex

From: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: fix lock contention for HASHHDR.mutex
Date: 2016-02-08 10:39:37
Message-ID: 20160208133937.648f1fcf@fujitsu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, Robert.

> So: do we have clear evidence that we need 128 partitions here, or
> might, say, 16 work just fine?

Yes, this number of partitions was chosen based on this benchmark (see
"spinlock array" column):

http://www.postgresql.org/message-id/20151229184851.1bb7d1bd@fujitsu

In fact we can choose 16, 32, 64 or 128 partitions - any number works
better than current implementation with a single spinlock. But
according to this benchmark 128 partitions give twice as much TPS then
16 partitions, almost as if we didn't have any locks at all (see "no
locks" column).

Still I'm a bit concerned that 128 partitions require (128-16)*
( sizeof(slock_t) + sizeof(void*) + sizeof(long) ) bytes of additional
memory per hash table which is:

(gdb) p (128-16)*(sizeof(slock_t) + sizeof(long) + sizeof(void*))
$1 = 1904

... bytes of shared memory on amd64.

I'm not sure if this is considered OK. If it is I suggest to leave
number 128. If it's not, perhaps we could agree on say 64 partitions as
a compromise. Described benchmark doesn't represent any possible
workload anyway. I personally believe that we don't have so many small
hash tables that 1.8 Kb of additional shared memory per hash table would
be an issue.

> I don't really want to increase the number of lock manager partition
> locks to 128 just for the convenience of this patch, and even less do
> I want to impose the requirement that every future shared hash table
> must use an equal number of partitions.

May I ask why? Does it create a real problem?

> I don't see any reason why the number of free list partitions needs
> to be a constant, or why it needs to be equal to the number of hash
> bucket partitions. That just seems like a completely unnecessary
> constraint. You can take the hash value, or whatever else you use to
> pick the partition, modulo any number you want.

True, number of spinlocks / freeLists could be a variable. I believe
it's an old-good simplicity vs flexibility question.

Lets say we allow user (i.e. calling side) to choose spinlocks and free
lists number. So now we should use freeList[hash % items_number] formula
which means division instead of binary shift. Do we really need such an
expensive operation just to calculate index of a free list? Probably
not. Let limit possible items_number values so it could be only power
of two. Now there is only four possible values user would more likely to
choose (16, 32, 64, 128), which is not as flexible anymore.

What about memory? If we allocate mutex, nentries and freeList
dynamically depending on items_number value, HASHHDR should store
pointers to allocated memory which means additional indirection for
almost every access to mutex, nentries and freeList fields. Probably a
bad idea. Otherwise we still waste shared memory, probably even more
memory since we don't want maximum items_number value to be too low.

Etc.

I believe such a change will cause only additional code complexity
(code is already far from trivial in dynahash.c) so next time it will
be much harder to change anything, probably some bugs we will miss and
will not solve any real problem. Why just not to choose a constant which
seems to be good enough instead?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2016-02-08 10:45:47 Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
Previous Message Andres Freund 2016-02-08 10:22:23 Re: backpatch for REL9_4_STABLE of commit 40482e606733675eb9e5b2f7221186cf81352da1