From: | Kirk Wythers <kirk(dot)wythers(at)gmail(dot)com> |
---|---|
To: | Steve Crawford <scrawford(at)pinpointresearch(dot)com> |
Cc: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: copy from questions |
Date: | 2012-12-19 21:06:07 |
Message-ID: | 79CD04C6-6CFB-47DA-9BED-10E5DBC76A5E@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Thanks for the reply Steve. These suggestions are new to me, so I'd like to rephrase them back to you in order to make sure I understand the bits and details.
On Dec 19, 2012, at 10:48 AM, Steve Crawford <scrawford(at)pinpointresearch(dot)com> wrote:
>>
> I suppose you could use a trigger to check each record before inserting but that is likely to be inefficient for bulk loads. A quick bash loop is probably your best bet. Something along the lines of:
>
> for inputfile in /infiledirectory/*.csv
> do
> cat inputfile | psql [connection-params] -c '\copy rawinput from stdin csv header...'
> done
I get this… If my except for the header… bit. Here is my interpretation of your code including my specific connection parameters.
#!/bin/sh
for inputfile in '/Volumes/disk7/b4warmed3/export/60min2/*.txt'
do
cat inputfile | psql -p 54321 -h localhost -c '\copy rawinput FROM stdin WITH CSV HEADER DELIMTER AS ',' NULL AS 'NA '
done
I added single quotes around the path to the input files. Correct right?
>
> This imports everything into a "staging" table (I called it rawinput). From there you can create your final table with SELECT DISTINCT…
This bit needs to be as a separate step right? (rowid is the primary key)
SELECT DISTINCT ON (rowid) *
FROM rawinput;
From here do I need another COPY FROM or some kind of INSERT statement?
>
> For speed make sure that you create your staging table as "unlogged".
I understand that I need to create the rawinput table first, but I am unfamiliar with the "UNLOGGED" option. I assume it makes things faster… Does it go something like:
CREATE TABLE UNLOGGED rawinput;
Do I need to create all the variables (including types) in rawinput as well? If so, then I assume that I do not want rowid to have a primary key… or else I would be back where I started.
>
> Cheers,
> Steve
>
Thanks again,
Kirk
From | Date | Subject | |
---|---|---|---|
Next Message | Kevin Grittner | 2012-12-19 21:16:08 | Re: pg_xlog is getting bigger |
Previous Message | Kevin Grittner | 2012-12-19 20:40:47 | Re: pg_xlog is getting bigger |