From: | "Dave Held" <dave(dot)held(at)arrayservicesgrp(dot)com> |
---|---|
To: | <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Modifying COPY TO |
Date: | 2005-02-25 22:19:09 |
Message-ID: | 49E94D0CFCD4DB43AFBA928DDD20C8F902618457@asg002.asg.local |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I am interested in hacking COPY TO such that one can specify that
rows are copied in a certain index order. I got as far as
src/backend/commands/copy.c:CopyTo(), and it looks like I would need
to modify the call to heap_beginscan() so that it uses a key. However,
I couldn't figure out how to provide one, or if I'm even looking at the
right area. Ideally, this behavior would be specified with a flag,
perhaps: "WITH INDEX <index_name>" or "WITH PRIMARY KEY"
or something similar.
The motivation for this change is as follows. I have a fairly large
database (10 million+ records) that mirrors the data in a proprietary
system. The only access to that data is through exported flat files.
Currently, those flat files are copied directly into a staging area in the
db via a COPY FROM, the actual tables are truncated, and the
staging data is inserted into the live tables. Since the data is read-only,
it doesn't matter that it is recreated every day. However, as you
can imagine, the import process takes quite a while (several hours).
Also, rebuilding the db from scratch every day loses any statistical
information gathered from the execution of queries during the day.
A possibility that I would like to pursue is to keep the staging data
from the previous day, do a COPY TO, import the new data into
another staging table with a COPY FROM, then export the fresh
data with another COPY TO. Then, I can write a fast C/C++
program to do a line-by-line comparison of each record, isolating
the ones that have changed from the previous day. I can then
emit those records in a change file that should be relatively small
and easy to update. Of course, this scheme can only work if
COPY TO emits the records in a reliable order.
Any assistance on this project would be greatly appreciated. The
best I can see, I'm stuck on line 1053 from copy.c:
scandesc = heap_beginscan(rel, mySnapshot, 0, NULL);
I suspect that I want it to look like this:
scandesc = heap_beginscan(rel, mySnapshot, 1, key);
where 'key' is an appropriately constructed ScanKey. It looks
like I want to call ScanKeyEntryInitialize(), but I'm not sure what
parameters I need to pass to it to get an index or the primary
key. I mostly need help building the ScanKey object. I think I
can figure out how to hack the custom option, etc. I should
mention that I am using the 7.4.7 codebase on Linux 2.4.
__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East, Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2005-02-25 22:56:50 | Re: Modifying COPY TO |
Previous Message | Bruce Momjian | 2005-02-25 22:09:20 | Re: Development schedule |