Re: URGENT: temporary table not recognized?

From: Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com>
To: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
Cc: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: URGENT: temporary table not recognized?
Date: 2012-01-06 23:55:42
Message-ID: CAFWfU=t_zqc0qBXTgUxDW1Cx0dviZbgUbT-Vj04vvP=Dyzihfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Jan 6, 2012 at 6:53 PM, Steve Crawford
<scrawford(at)pinpointresearch(dot)com> wrote:
> On 01/06/2012 03:42 PM, Phoenix Kiula wrote:
>>
>> On Fri, Jan 6, 2012 at 6:20 PM, Adrian Klaver<adrian(dot)klaver(at)gmail(dot)com>
>>  wrote:
>>
>>> http://www.postgresql.org/docs/9.0/interactive/sql-copy.html
>>>
>>> Search for
>>> NULL
>>
>>
>>
>> Thanks Adrian.
>>
>> Without examples, it's hard to predict syntax. If the value after a
>> pipe is missing altogether, I suppose the missing value is "\n"
>> (newline). But this doesn't work:
>>
>>    copy vl from 'data.txt' WITH DELIMITER AS '|' NULL '\n';
>>
>> None of these work either:
>>
>>    copy vl from 'data.txt' WITH DELIMITER AS '|' NULL \n;
>>    copy vl from 'data.txt' WITH DELIMITER AS '|' NULL \\n;
>>    copy vl from 'data.txt' WITH DELIMITER AS '|' NULL '';
>>
>> The first two give errors, the third one throws the same missing value
>> for column error.
>>
>> The data is stored like this:
>>
>>     123|big string here|189209209|US|2001-01-01
>>     123|big string here|189209209|US|2001-01-01
>>     123|big string here|189209209|US|2001-01-01
>>     123|big string here|189209209|US|2001-01-01
>>
>> But sometimes, the strings are:
>>
>>     |big string here|189209209|US|2001-01-01
>>     |big string here|189209209|US|2001-01-01
>>
>> Or
>>
>>     123|big string here|189209209|US
>>     123|big string here|189209209|US|
>>
>> So you see either the first column, which is the ID in a way, is
>> missing so the "missing character" is probably a blank (''?). In this
>> case I want COPY to just ignore this line.
>>
>> Or the last column is missing, where the missing character can be a
>> newline I suppose?
>>
>> So how do I specify this in the COPY command so that it doesn't croak?
>> If a line's ID is missing, it should ignore the line and go on instead
>> of not doing anything by throwing an error for EVERYTHING!
>>
>> Thanks.
>>
>>
> Missing data is one thing, missing delimiters is another. Try doing a small
> copy of data with just a few lines to see which variants are actually
> causing the error. My money is on the one that has a mismatch between the
> table column count and the data column count. I.e., the row with three
> delimiters instead of four:
>
>
> 23|big string here|189209209|US
>
> When you say "ignore", do you mean that you want PostgreSQL to assume a null
> value for the missing column or to not import that row at all?
>
> In general, when you have data scrubbing issues like this, grep/sed/awk/...
> are your friends. Clean it up then import it.
>
> I suppose you could import all rows into a big text field and process it in
> PostgreSQL but I doubt you will find that to be an optimal solution.

Thanks Steve.

The file has 350 million lines. Sed, Awk etc are a little painful when
the file is 18GB witht hat many lines.

I'd want Postgresql to ignore the line altogether when something is
missing. Is this an option we can use, or are rules hoisted on us?

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Phoenix Kiula 2012-01-07 00:16:09 Re: URGENT: temporary table not recognized?
Previous Message Adrian Klaver 2012-01-06 23:54:58 Re: URGENT: temporary table not recognized?