From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Batch update of indexes on data loading |
Date: | 2008-03-05 15:56:17 |
Message-ID: | 200803051556.m25FuHh10617@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Added to TODO:
o Allow COPY FROM to create index entries in bulk
http://archives.postgresql.org/pgsql-hackers/2008-02/msg00811.php
---------------------------------------------------------------------------
ITAGAKI Takahiro wrote:
> This is a proposal of fast data loading using batch update of indexes for 8.4.
> It is a part of pg_bulkload (http://pgbulkload.projects.postgresql.org/) and
> I'd like to integrate it in order to cooperate with other parts of postgres.
>
> The basic concept is spooling new coming data, and merge the spool and
> the existing indexes into a new index at the end of data loading. It is
> 5-10 times faster than index insertion per-row, that is the way in 8.3.
>
>
> One of the problem is locking; Index building in bulkload is similar to
> REINDEX rather than INSERT, so we need ACCESS EXCLUSIVE LOCK during it.
> Bulkloading is not a upper compatible method, so I'm thinking about
> adding a new "WITH LOCK" option for COPY command.
>
> COPY tbl FROM 'datafile' WITH LOCK;
>
> If the LOCK option is specified, the behavior of COPY will be changed
> as follows:
>
> 1. Lock the target table in ACCESS EXCLUSIVE mode instead of ROW EXCLUSIVE.
> 2. Prepare spooler (BTSpool) for each indexes.
> 3. For each new row, put index entries into the spools (_bt_spool)
> instead of index_insert.
> 4. At the end of COPY, merge the spool and the existing indexes into a new
> index file. The relfilenode of the index is changed like REINDEX.
>
> However, there might be better interfaces for bulk index creation.
> For example, if we want to use it with pgloader, we might need
> "bulkload mode" for indexes. pgloader commits every 10000 rows,
> so the index spooler must keep alive until end of the session
> over transactions. (or end of the transaction over sub-transactions)
>
> I'm working toward the simple "COPY WITH LOCK" approach for now,
> but if there are other better ideas, I want to use them.
> Advices and suggestions welcome.
>
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2008-03-05 15:59:42 | Re: 8.3.0 Core with concurrent vacuum fulls |
Previous Message | Gavin M. Roy | 2008-03-05 15:53:23 | Re: 8.3.0 Core with concurrent vacuum fulls |