Re: Building a database from a flat file

From: Sean Davis <sdavis2(at)mail(dot)nih(dot)gov>
To: "Casey T(dot) Deccio" <ctdecci(at)sandia(dot)gov>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Building a database from a flat file
Date: 2005-03-03 13:23:55
Message-ID: 8014dca0db521733863bb7967ce4be95@mail.nih.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql


On Mar 2, 2005, at 7:15 PM, Casey T. Deccio wrote:

> A database I am currently using is built and updated periodically from
> a
> flat csv file (The situation is rather unfortunate, but that's all I
> have right now). The schema I use is more complex than the flat file,
> so I follow a process to populate the tables with the data from the
> file. First I slurp the whole file into one temporary table, whose
> columns correspond to the columns in the file. Then I DELETE all the
> existing rows from the tables in the schema and perform a series of
> queries on that table to INSERT and UPDATE rows in the tables that are
> in the schema. Then I DELETE the data from the temporary table. I do
> it this way, rather than trying to synchronize it, because of the
> inconsistencies and redundancies in the flat file.
>
> There is more than one problem with this, but the largest is that I
> would like to perform this whole database rebuild within one
> transaction, so other processes that need to access the database can do
> so without noticing the disturbance. However, performing this set of
> events (besides populating the temporary table) within a single
> transaction takes a long time--over an hour in some cases.
>
> What are some suggestions to help improve performance with replacing
> one
> set of data in a schema with another?

Why not rebuild the entire thing in a separate "build" schema then do
only the stuff like copying tables inside the transaction block:

BEGIN;

truncate table1;
truncate table2;

--reset any sequences you feel you need to have reset;

select * into table1 from build.table1;
select * into table2 from build.table2;

vacuum analyze table1;
vacuum analyze table2;

COMMIT;

I haven't tried this method exactly, but building in a separate schema
(expensive) and then doing cheap operations like copying the table into
the working schema should minimize the amount of time you spend inside
the transaction block.

**I don't know what effect this will have on performance of the whole
process, though**.

Sean

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Sean Davis 2005-03-03 13:42:51 Re: how to speed up these queries ?
Previous Message Sean Davis 2005-03-03 13:08:42 Re: how to speed up these queries ?