From: | "Goel, Dhruv" <goeldhru(at)amazon(dot)com> |
---|---|
To: | Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY |
Date: | 2019-06-06 22:13:14 |
Message-ID: | 3A3950FB-5887-439C-976E-14295B219CE3@amazon.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Yes, you are correct. The test case here was that if a tuple is inserted after the reference snapshot is taken in Phase 2 and before the index is marked ready. If this tuple is deleted before the reference snapshot of Phase 3, it will never make it to the index. I have fixed this problem by making pg_index tuple updates transactional (I believe there is no reason why it has to be in place now) so that the xmin of the pg_index tuple is same the xmin of the snapshot in Phase 3.
Attached the amended patch.
From: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
Date: Wednesday, May 15, 2019 at 3:45 AM
To: "Goel, Dhruv" <goeldhru(at)amazon(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY
Hello,
On Wed, May 15, 2019 at 1:45 PM Goel, Dhruv <goeldhru(at)amazon(dot)com<mailto:goeldhru(at)amazon(dot)com>> wrote:
Proposed Solution:
We remove the third wait state completely from the concurrent index build. When we mark the index as ready, we also mark “indcheckxmin” to true which essentially enforces Postgres to not use this index for older snapshots.
I think there is a problem in the proposed solution. When phase 3 is reached, the index is valid. But it might not contain tuples deleted just before the reference snapshot was taken. Hence, we wait for those transactions that might have older snapshot. The TransactionXmin of these transactions can be greater than the xmin of the pg_index entry for this index.
Instead of waiting in the third phase, if we just set indcheckxmin as true, the above transactions will be able to use the index which is wrong. (because they won't find the recently deleted tuples from the index that are still live according to their snapshots)
The respective code from get_relation_info:
if (index->indcheckxmin &&
!TransactionIdPrecedes(HeapTupleHeaderGetXmin(indexRelation->rd_indextuple->t_data), TransactionXmin))
{ /* don't use this index */ }
Please let me know if I'm missing something.
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
patch-cic.patch | application/octet-stream | 7.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2019-06-06 22:52:24 | Re: behaviour change - default_tablesapce + partition table |
Previous Message | Robert Haas | 2019-06-06 20:40:53 | tableam: abstracting relation sizing code |