From: | Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Skip all-visible pages during second HeapScan of CIC |
Date: | 2017-02-28 13:42:03 |
Message-ID: | CABOikdO+=3=rK_Y=8o-xd5oPiNSPsoORYThJUCNE8kWm1pWOow@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello All,
During the second heap scan of CREATE INDEX CONCURRENTLY, we're only
interested in the tuples which were inserted after the first scan was
started. All such tuples can only exists in pages which have their VM bit
unset. So I propose the attached patch which consults VM during second scan
and skip all-visible pages. We do the same trick of skipping pages only if
certain threshold of pages can be skipped to ensure OS's read-ahead is not
disturbed.
The patch obviously shows significant reduction of time for building index
concurrently for very large tables, which are not being updated frequently
and which was vacuumed recently (so that VM bits are set). I can post
performance numbers if there is interest. For tables that are being updated
heavily, the threshold skipping was indeed useful and without that we saw a
slight regression.
Since VM bits are only set during VACUUM which conflicts with CIC on the
relation lock, I don't see any risk of incorrectly skipping pages that the
second scan should have scanned.
Comments?
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment | Content-Type | Size |
---|---|---|
cic_skip_all_visible_v3.patch | application/octet-stream | 10.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2017-02-28 13:51:24 | Re: BRIN de-summarize ranges |
Previous Message | Tom Lane | 2017-02-28 13:30:43 | Re: avoid bloat from CREATE INDEX CONCURRENTLY |