From: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
Cc: | "Patches (PostgreSQL)" <pgsql-patches(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] multiline CSV fields |
Date: | 2004-12-03 17:13:26 |
Message-ID: | 200412031713.iB3HDQE04288@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
Patch applied. Thanks.
---------------------------------------------------------------------------
Andrew Dunstan wrote:
>
>
> I wrote:
>
> >
> > If it bothers you that much. I'd make a flag, cleared at the start of
> > each COPY, and then where we test for CR or LF in CopyAttributeOutCSV,
> > if the flag is not set then set it and issue the warning.
>
>
>
> I didn't realise until Bruce told me just now that I was on the hook for
> this. I guess i should keep my big mouth shut. (Yeah, that's gonna
> happen ...)
>
> Anyway, here's a tiny patch that does what I had in mind.
>
> cheers
>
> andrew
[ text/x-patch is unsupported, treating like TEXT/PLAIN ]
> Index: copy.c
> ===================================================================
> RCS file: /home/cvsmirror/pgsql/src/backend/commands/copy.c,v
> retrieving revision 1.234
> diff -c -r1.234 copy.c
> *** copy.c 6 Nov 2004 17:46:27 -0000 1.234
> --- copy.c 2 Dec 2004 23:34:20 -0000
> ***************
> *** 98,103 ****
> --- 98,104 ----
> static EolType eol_type; /* EOL type of input */
> static int client_encoding; /* remote side's character encoding */
> static int server_encoding; /* local encoding */
> + static bool embedded_line_warning;
>
> /* these are just for error messages, see copy_in_error_callback */
> static bool copy_binary; /* is it a binary copy? */
> ***************
> *** 1190,1195 ****
> --- 1191,1197 ----
> attr = tupDesc->attrs;
> num_phys_attrs = tupDesc->natts;
> attr_count = list_length(attnumlist);
> + embedded_line_warning = false;
>
> /*
> * Get info about the columns we need to process.
> ***************
> *** 2627,2632 ****
> --- 2629,2653 ----
> !use_quote && (c = *test_string) != '\0';
> test_string += mblen)
> {
> + /*
> + * We don't know here what the surrounding line end characters
> + * might be. It might not even be under postgres' control. So
> + * we simple warn on ANY embedded line ending character.
> + *
> + * This warning will disappear when we make line parsing field-aware,
> + * so that we can reliably read in embedded line ending characters
> + * regardless of the file's line-end context.
> + *
> + */
> +
> + if (!embedded_line_warning && (c == '\n' || c == '\r') )
> + {
> + embedded_line_warning = true;
> + elog(WARNING,
> + "CSV fields with embedded linefeed or carriage return "
> + "characters might not be able to be reimported");
> + }
> +
> if (c == delimc || c == quotec || c == '\n' || c == '\r')
> use_quote = true;
> if (!same_encoding)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2004-12-03 17:18:00 | OK, ready for RC1 or Beta6 |
Previous Message | Doug McNaught | 2004-12-03 17:01:49 | Port report--Linux/sparc32 |
From | Date | Subject | |
---|---|---|---|
Next Message | David Kamholz | 2004-12-03 18:16:55 | utf-8 flag always off in plperl function arguments |
Previous Message | Bruce Momjian | 2004-12-03 17:11:59 | Re: Docs - Plpgsql trigger example auditing changes into |