Re: Why does CREATE INDEX CONCURRENTLY need two scans?

From: Joshua Ma <josh(at)benchling(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-general(at)postgresql(dot)org>
Subject: Re: Why does CREATE INDEX CONCURRENTLY need two scans?
Date: 2015-04-01 04:06:27
Message-ID: CAG9XPV=sdOrbPs3d+M8qd5RpFTGpaHM=F1s3MS4L3XHYB-VAww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Ah, that's exactly what I was looking for. Thanks everyone for the
responses!

- Josh

On Tue, Mar 31, 2015 at 8:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
> > On Wed, Apr 1, 2015 at 9:43 AM, Joshua Ma <josh(at)benchling(dot)com> wrote:
> >> Why are two scans necessary? What would break if it did something like
> the
> >> following?
> >>
> >> 1) insert pg_index entry, wait for relevant txns to finish, mark index
> >> open for inserts
> >>
> >> 2) build index in a single snapshot, mark index valid for searches
>
> >> Wouldn't new inserts update the index correctly? Between the snapshot
> and
> >> index-updating txns afterwards, wouldn't all updates be covered?
>
> > When an index is built with index_build, are included in the index only
> the
> > tuples seen at the start of the first scan. A second scan is needed to
> add
> > in the index entries for the tuples that have been inserted into the
> table
> > during the build phase.
>
> More to the point: Joshua's design supposes that retail insertions into
> an index can happen in parallel with index build. Or in other words,
> that index build consists of instantaneously creating an empty-but-valid
> index file and then doing a lot of ordinary inserts into it. That's a
> possible design, but it's not very efficient, and most of our index AMs
> don't do it that way. btree, for instance, starts by sorting all the
> entries and creating the leaf-level pages. Then it builds the upper tree
> levels. It doesn't have a complete tree that could support retail
> insertions until the very end. Moreover, most of the work is done in
> storage that's local to the backend running CREATE INDEX, and isn't
> accessible to other processes at all.
>
> regards, tom lane
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2015-04-01 04:17:14 Re: Would like to know how analyze works technically
Previous Message Tom Lane 2015-04-01 03:54:38 Re: Why does CREATE INDEX CONCURRENTLY need two scans?