Re: plpgsql copy import csv double quotes

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: PASCAL CROZET <pascal(dot)crozet(at)qualis-consulting(dot)com>, PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: plpgsql copy import csv double quotes
Date: 2019-10-10 19:20:15
Message-ID: babdc610-52d9-8f71-7725-22a55adce71c@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 10/10/19 8:30 AM, PASCAL CROZET wrote:
> Thanks,
>
>
> As it's write in the page,
> <https://www.postgresql.org/docs/9.3/sql-copy.html> the file format is
> more a convention than a standard.
>
>
> With collegues, we've find the solution with a regex in sed
>
>
> sed -e 's/\([^,]\)"\([^,]\)/\1\2/g' -e  's/\([^,]\)"\([^,]\)/\1\2/g'
>
> Because file contains values like
>
> ,"hostname1.fqdn.ad|\"\"\"\"\"\"\"\"\"\"0Clean|OK"

The issue is less the file format then the data value format. I am
struggling to figure out what the above is doing.

>
>
> But, other csv file, that contains this value ","NAME=\"UBUNTU\"","|
> works well. I find this value in destination column |NAME=\UBUNTU\| in
> the destination table.
>
> The main lines in the plpgsql function are :

You might want to look at dollar quoting:

https://www.postgresql.org/docs/9.3/plpgsql-development-tips.html
40.11.1. Handling of Quotation Marks

If it where me I would separate out the commands below into distinct
EXECUTES, it would be easier to follow. That will still result in the
TRUNCATE being rolled back as it is part of the function transaction and:

https://www.postgresql.org/docs/11/sql-truncate.html
"TRUNCATE is transaction-safe with respect to the data in the tables:
the truncation will be safely rolled back if the surrounding transaction
does not commit."

If you want to deal with errors then:

https://www.postgresql.org/docs/9.3/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING

>
> CREATE OR REPLACE FUNCTION insert_into_db_sources_csv(
>     filename character varying,
>     tablename character varying,
>     delimiter character varying,
>     header character varying)
>   RETURNS void AS
> ....
> request := 'TRUNCATE ' || tablename || '; COPY ' || tablename || ' FROM
> ''' || filename || ''' CSV ' || header || ' DELIMITER ''' || delimiter
> || ''' ENCODING ''UTF-8'';';
> EXECUTE request;
>
> The function call :
>
> select
> insert_into_db_sources_csv('/DATA/input/files/Extract_software.csv',
> 't_m03_software', ',', 'HEADER');
>
>
> If the import fails, TRUNCATE isn't executed. The previous data's that
> was in table remains the same.
>
>
> *_________________________________*
>
> Cordialement, *Pascal CROZET**
> *
>
> *DBA *
>
> •www.qualis-consulting.com <http://www.qualis-consulting.com/>•04 78 22
> 74 90
>
> •Le Bois des Côtes 1 – Bâtiment A
> •300 Route Nationale 6 – 69760 LIMONEST
> *_________________________________*
>
>
>
> ------------------------------------------------------------------------
> *De :* Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> *Envoyé :* jeudi 10 octobre 2019 00:31
> *À :* PASCAL CROZET
> *Cc :* PG-General Mailing List
> *Objet :* Re: plpgsql copy import csv double quotes
> PASCAL CROZET <pascal(dot)crozet(at)qualis-consulting(dot)com> writes:
>> I’ve experience issues with double quotes \34 inside fields, in a csv file.
>
>> Ex :
>> "value1","some text","other text with "double quotes" inside","last field"
>
> I don't know of any definition of CSV format by which that's legal data.
> The typical rule is that double quotes that are data must be doubled;
> at least, that's what COPY expects by default.  You can also get COPY
> to handle variants like backslash-quote.
>
>                         regards, tom lane

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2019-10-10 19:32:38 Re: Segmentation fault with PG-12
Previous Message Yessica Brinkmann 2019-10-10 18:35:00 Re: The connection to the server was lost. Attempting reset: Failed.