Re: Why does CREATE INDEX CONCURRENTLY need two scans?

From: Joshua Ma <josh(at)benchling(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-general(at)postgresql(dot)org>
Subject: Re: Why does CREATE INDEX CONCURRENTLY need two scans?
Date: 2015-04-01 03:51:00
Message-ID: CAG9XPV=v1HDgvk-SnKt4JDsyL45t2f8DeW3fZwoBEupJS3pz4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Michael,

Isn't that also true during the 2nd scan? I'm assuming new inserts during
the 2nd scan properly update the index, so couldn't the same mechanism
update the index during the 1st scan?

I guess I'm confused because, if you assume a pathological case where all
the data gets inserted after the 1st snapshot, the 1st scan wouldn't pick
anything up and the 3-step process I had earlier becomes identical to the
2-step process. Is there something special about index_build?

- Josh

On Tue, Mar 31, 2015 at 7:08 PM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:

>
>
> On Wed, Apr 1, 2015 at 9:43 AM, Joshua Ma <josh(at)benchling(dot)com> wrote:
>
>> Hi all,
>>
>> I was curious about why CONCURRENTLY needs two scans to complete - from
>> the documentation on HOT (access/heap/README.HOT), it looks like the
>> process is:
>>
>> 1) insert pg_index entry, wait for relevant in-progress txns to finish
>> (before marking index open for inserts, so HOT updates won't write
>> incorrect index entries)
>> 2) build index in 1st snapshot, mark index open for inserts
>> 3) in 2nd snapshot, validate index and insert missing tuples since first
>> snapshot, mark index valid for searches
>>
>> Why are two scans necessary? What would break if it did something like
>> the following?
>>
>> 1) insert pg_index entry, wait for relevant txns to finish, mark index
>> open for inserts
>>
> 2) build index in a single snapshot, mark index valid for searches
>>
>
>> Wouldn't new inserts update the index correctly? Between the snapshot and
>> index-updating txns afterwards, wouldn't all updates be covered?
>>
>
> When an index is built with index_build, are included in the index only
> the tuples seen at the start of the first scan. A second scan is needed to
> add in the index entries for the tuples that have been inserted into the
> table during the build phase.
> --
> Michael
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2015-04-01 03:54:38 Re: Why does CREATE INDEX CONCURRENTLY need two scans?
Previous Message TonyS 2015-04-01 02:49:27 Would like to know how analyze works technically