From: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
---|---|
To: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> |
Cc: | Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: adding wait_start column to pg_locks |
Date: | 2021-02-15 06:17:51 |
Message-ID: | 655f1a5b-4fa8-53ba-c3e2-8cee31587d37@oss.nttdata.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2021/02/10 10:43, Fujii Masao wrote:
>
>
> On 2021/02/09 23:31, torikoshia wrote:
>> On 2021-02-09 22:54, Fujii Masao wrote:
>>> On 2021/02/09 19:11, Fujii Masao wrote:
>>>>
>>>>
>>>> On 2021/02/09 18:13, Fujii Masao wrote:
>>>>>
>>>>>
>>>>> On 2021/02/09 17:48, torikoshia wrote:
>>>>>> On 2021-02-05 18:49, Fujii Masao wrote:
>>>>>>> On 2021/02/05 0:03, torikoshia wrote:
>>>>>>>> On 2021-02-03 11:23, Fujii Masao wrote:
>>>>>>>>>> 64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"?
>>>>>>>>>
>>>>>>>>> Also it might be worth thinking to use 64-bit atomic operations like
>>>>>>>>> pg_atomic_read_u64(), for that.
>>>>>>>>
>>>>>>>> Thanks for your suggestion and advice!
>>>>>>>>
>>>>>>>> In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64().
>>>>>>>>
>>>>>>>> waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type.
>>>>>>>>
>>>>>>>> I may be using these functions not correctly, so if something is wrong, I would appreciate any comments.
>>>>>>>>
>>>>>>>>
>>>>>>>> About the documentation, since your suggestion seems better than v6, I used it as is.
>>>>>>>
>>>>>>> Thanks for updating the patch!
>>>>>>>
>>>>>>> + if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
>>>>>>> + pg_atomic_write_u64(&MyProc->waitStart,
>>>>>>> + pg_atomic_read_u64((pg_atomic_uint64 *) &now));
>>>>>>>
>>>>>>> pg_atomic_read_u64() is really necessary? I think that
>>>>>>> "pg_atomic_write_u64(&MyProc->waitStart, now)" is enough.
>>>>>>>
>>>>>>> + deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT);
>>>>>>> + pg_atomic_write_u64(&MyProc->waitStart,
>>>>>>> + pg_atomic_read_u64((pg_atomic_uint64 *) &deadlockStart));
>>>>>>>
>>>>>>> Same as above.
>>>>>>>
>>>>>>> + /*
>>>>>>> + * Record waitStart reusing the deadlock timeout timer.
>>>>>>> + *
>>>>>>> + * It would be ideal this can be synchronously done with updating
>>>>>>> + * lock information. Howerver, since it gives performance impacts
>>>>>>> + * to hold partitionLock longer time, we do it here asynchronously.
>>>>>>> + */
>>>>>>>
>>>>>>> IMO it's better to comment why we reuse the deadlock timeout timer.
>>>>>>>
>>>>>>> proc->waitStatus = waitStatus;
>>>>>>> + pg_atomic_init_u64(&MyProc->waitStart, 0);
>>>>>>>
>>>>>>> pg_atomic_write_u64() should be used instead? Because waitStart can be
>>>>>>> accessed concurrently there.
>>>>>>>
>>>>>>> I updated the patch and addressed the above review comments. Patch attached.
>>>>>>> Barring any objection, I will commit this version.
>>>>>>
>>>>>> Thanks for modifying the patch!
>>>>>> I agree with your comments.
>>>>>>
>>>>>> BTW, I ran pgbench several times before and after applying
>>>>>> this patch.
>>>>>>
>>>>>> The environment is virtual machine(CentOS 8), so this is
>>>>>> just for reference, but there were no significant difference
>>>>>> in latency or tps(both are below 1%).
>>>>>
>>>>> Thanks for the test! I pushed the patch.
>>>>
>>>> But I reverted the patch because buildfarm members rorqual and
>>>> prion don't like the patch. I'm trying to investigate the cause
>>>> of this failures.
>>>>
>>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2021-02-09%2009%3A20%3A10
>>>
>>> - relation | locktype | mode
>>> ------------------+----------+---------------------
>>> - test_prepared_1 | relation | RowExclusiveLock
>>> - test_prepared_1 | relation | AccessExclusiveLock
>>> -(2 rows)
>>> -
>>> +ERROR: invalid spinlock number: 0
>>>
>>> "rorqual" reported that the above error happened in the server built with
>>> --disable-atomics --disable-spinlocks when reading pg_locks after
>>> the transaction was prepared. The cause of this issue is that "waitStart"
>>> atomic variable in the dummy proc created at the end of prepare transaction
>>> was not initialized. I updated the patch so that pg_atomic_init_u64() is
>>> called for the "waitStart" in the dummy proc for prepared transaction.
>>> Patch attached. I confirmed that the patched server built with
>>> --disable-atomics --disable-spinlocks passed all the regression tests.
>>
>> Thanks for fixing the bug, I also tested v9.patch configured with
>> --disable-atomics --disable-spinlocks on my environment and confirmed
>> that all tests have passed.
>
> Thanks for the test!
>
> I found another bug in the patch. InitProcess() initializes "waitStart",
> but previously InitAuxiliaryProcess() did not. This could cause "invalid
> spinlock number" error when reading pg_locks in the standby server.
> I fixed that. Attached is the updated version of the patch.
I pushed this version. Thanks!
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
From | Date | Subject | |
---|---|---|---|
Next Message | vignesh C | 2021-02-15 06:23:00 | Re: logical replication seems broken |
Previous Message | Thomas Munro | 2021-02-15 05:32:46 | Re: PG vs LLVM 12 on seawasp, next round |