From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru> |
Cc: | Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Patch: fix lock contention for HASHHDR.mutex |
Date: | 2016-02-08 14:37:14 |
Message-ID: | CA+TgmoZKfuJog2MoZrmPsrVh4b8owruPU3+Nm_7bMgVcc8U3WQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Feb 8, 2016 at 5:39 AM, Aleksander Alekseev
<a(dot)alekseev(at)postgrespro(dot)ru> wrote:
>> So: do we have clear evidence that we need 128 partitions here, or
>> might, say, 16 work just fine?
>
> Yes, this number of partitions was chosen based on this benchmark (see
> "spinlock array" column):
>
> http://www.postgresql.org/message-id/20151229184851.1bb7d1bd@fujitsu
It's very hard to understand what that email is trying to say. It
uses terms like PN and AN without defining them. There's certainly
not enough detail for somebody else to replicate what you did.
> In fact we can choose 16, 32, 64 or 128 partitions - any number works
> better than current implementation with a single spinlock. But
> according to this benchmark 128 partitions give twice as much TPS then
> 16 partitions, almost as if we didn't have any locks at all (see "no
> locks" column).
OK.
> Still I'm a bit concerned that 128 partitions require (128-16)*
> ( sizeof(slock_t) + sizeof(void*) + sizeof(long) ) bytes of additional
> memory per hash table which is:
>
> (gdb) p (128-16)*(sizeof(slock_t) + sizeof(long) + sizeof(void*))
> $1 = 1904
>
> ... bytes of shared memory on amd64.
>
> I'm not sure if this is considered OK. If it is I suggest to leave
> number 128. If it's not, perhaps we could agree on say 64 partitions as
> a compromise. Described benchmark doesn't represent any possible
> workload anyway. I personally believe that we don't have so many small
> hash tables that 1.8 Kb of additional shared memory per hash table would
> be an issue.
I don't particularly care about using slightly more shared memory per
hash table.
>> I don't really want to increase the number of lock manager partition
>> locks to 128 just for the convenience of this patch, and even less do
>> I want to impose the requirement that every future shared hash table
>> must use an equal number of partitions.
>
> May I ask why? Does it create a real problem?
It's hard to be 100% sure about the future, but suppose that in
another 10 years we discover that the buffer mapping hash table now
needs to have 1024 partitions, but the lock manager partition hash
table still only needs 16. Do we really want to drag the lock manager
and any future shared hash tables up to 1024 just because the buffer
manager needs it? That sounds like a bad idea from here. For
example, one advantage of having fewer partitions is that it takes
less time to lock them all. That matters to - for example - the cost
of running the deadlock detector. Is the cost enough to matter in
that particular case with the code as it stands? I don't know. Could
having more partitions ever be worse than having fewer? I'm pretty
sure that the answer is yes.
>> I don't see any reason why the number of free list partitions needs
>> to be a constant, or why it needs to be equal to the number of hash
>> bucket partitions. That just seems like a completely unnecessary
>> constraint. You can take the hash value, or whatever else you use to
>> pick the partition, modulo any number you want.
>
> True, number of spinlocks / freeLists could be a variable. I believe
> it's an old-good simplicity vs flexibility question.
>
> Lets say we allow user (i.e. calling side) to choose spinlocks and free
> lists number. So now we should use freeList[hash % items_number] formula
> which means division instead of binary shift. Do we really need such an
> expensive operation just to calculate index of a free list? Probably
> not. Let limit possible items_number values so it could be only power
> of two. Now there is only four possible values user would more likely to
> choose (16, 32, 64, 128), which is not as flexible anymore.
Sure, I don't see anything wrong with requiring the number of
partitions to be a power of two. On the other hand, I also suspect
that on modern hardware the cost of a shift versus a modulo operator
is almost irrelevant. What's really going to make this fast or slow
is the degree to which it has, or does not have, cache line
contention.
> What about memory? If we allocate mutex, nentries and freeList
> dynamically depending on items_number value, HASHHDR should store
> pointers to allocated memory which means additional indirection for
> almost every access to mutex, nentries and freeList fields. Probably a
> bad idea. Otherwise we still waste shared memory, probably even more
> memory since we don't want maximum items_number value to be too low.
These seem like problems that are admit of more than one good
solution, and which you are certainly capable of solving. For
example, you could always include space for 128 freelists but not tie
the number of freelists to the number of partition locks, or you could
always include space for 128 freelists and have the system only use
them all if there are 128 or more partitions. If there are fewer than
128 partitions, it uses only as many freelists as there are
partitions. Neither of these things sounds very hard to implement.
Basically, the burden for you to impose a new coding rule on everybody
who uses shared hash tables in the future is very high. You would
need to show that that really isn't going to be an issue for anyone in
any reasonable situation. I don't think you can show that, because I
don't think it's true. I don't even like the idea of raising the
number of partitions for the lock manager for the convenience of your
code. If that turns out to have negative performance consequences in
some situation, we'd either have to live with or revert your entire
patch. Those kinds of trade-offs aren't good for you as a patch
author, or for the product. I'm not going to commit this patch if it
imposes new constraints on choosing the number of partitions in a
dynamic hash table, and I'm going to oppose anyone else committing it
either. If this problem couldn't be solved without imposing such
constraints, we might decide (after long soul-searching) that it was
worth it. But it seems pretty clear that they aren't necessary in
this case, so I'm against them.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2016-02-08 14:41:06 | Re: WAL Re-Writes |
Previous Message | David Steele | 2016-02-08 14:36:40 | Re: remove wal_level archive |