From: | Joshua Ma <josh(at)benchling(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Why does CREATE INDEX CONCURRENTLY need two scans? |
Date: | 2015-04-01 04:06:27 |
Message-ID: | CAG9XPV=sdOrbPs3d+M8qd5RpFTGpaHM=F1s3MS4L3XHYB-VAww@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Ah, that's exactly what I was looking for. Thanks everyone for the
responses!
- Josh
ᐧ
On Tue, Mar 31, 2015 at 8:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
> > On Wed, Apr 1, 2015 at 9:43 AM, Joshua Ma <josh(at)benchling(dot)com> wrote:
> >> Why are two scans necessary? What would break if it did something like
> the
> >> following?
> >>
> >> 1) insert pg_index entry, wait for relevant txns to finish, mark index
> >> open for inserts
> >>
> >> 2) build index in a single snapshot, mark index valid for searches
>
> >> Wouldn't new inserts update the index correctly? Between the snapshot
> and
> >> index-updating txns afterwards, wouldn't all updates be covered?
>
> > When an index is built with index_build, are included in the index only
> the
> > tuples seen at the start of the first scan. A second scan is needed to
> add
> > in the index entries for the tuples that have been inserted into the
> table
> > during the build phase.
>
> More to the point: Joshua's design supposes that retail insertions into
> an index can happen in parallel with index build. Or in other words,
> that index build consists of instantaneously creating an empty-but-valid
> index file and then doing a lot of ordinary inserts into it. That's a
> possible design, but it's not very efficient, and most of our index AMs
> don't do it that way. btree, for instance, starts by sorting all the
> entries and creating the leaf-level pages. Then it builds the upper tree
> levels. It doesn't have a complete tree that could support retail
> insertions until the very end. Moreover, most of the work is done in
> storage that's local to the backend running CREATE INDEX, and isn't
> accessible to other processes at all.
>
> regards, tom lane
>
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2015-04-01 04:17:14 | Re: Would like to know how analyze works technically |
Previous Message | Tom Lane | 2015-04-01 03:54:38 | Re: Why does CREATE INDEX CONCURRENTLY need two scans? |