From: | Sean Davis <sdavis2(at)mail(dot)nih(dot)gov> |
---|---|
To: | "Casey T(dot) Deccio" <ctdecci(at)sandia(dot)gov> |
Cc: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: Building a database from a flat file |
Date: | 2005-03-03 13:23:55 |
Message-ID: | 8014dca0db521733863bb7967ce4be95@mail.nih.gov |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
On Mar 2, 2005, at 7:15 PM, Casey T. Deccio wrote:
> A database I am currently using is built and updated periodically from
> a
> flat csv file (The situation is rather unfortunate, but that's all I
> have right now). The schema I use is more complex than the flat file,
> so I follow a process to populate the tables with the data from the
> file. First I slurp the whole file into one temporary table, whose
> columns correspond to the columns in the file. Then I DELETE all the
> existing rows from the tables in the schema and perform a series of
> queries on that table to INSERT and UPDATE rows in the tables that are
> in the schema. Then I DELETE the data from the temporary table. I do
> it this way, rather than trying to synchronize it, because of the
> inconsistencies and redundancies in the flat file.
>
> There is more than one problem with this, but the largest is that I
> would like to perform this whole database rebuild within one
> transaction, so other processes that need to access the database can do
> so without noticing the disturbance. However, performing this set of
> events (besides populating the temporary table) within a single
> transaction takes a long time--over an hour in some cases.
>
> What are some suggestions to help improve performance with replacing
> one
> set of data in a schema with another?
Why not rebuild the entire thing in a separate "build" schema then do
only the stuff like copying tables inside the transaction block:
BEGIN;
truncate table1;
truncate table2;
--reset any sequences you feel you need to have reset;
select * into table1 from build.table1;
select * into table2 from build.table2;
vacuum analyze table1;
vacuum analyze table2;
COMMIT;
I haven't tried this method exactly, but building in a separate schema
(expensive) and then doing cheap operations like copying the table into
the working schema should minimize the amount of time you spend inside
the transaction block.
**I don't know what effect this will have on performance of the whole
process, though**.
Sean
From | Date | Subject | |
---|---|---|---|
Next Message | Sean Davis | 2005-03-03 13:42:51 | Re: how to speed up these queries ? |
Previous Message | Sean Davis | 2005-03-03 13:08:42 | Re: how to speed up these queries ? |