From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
---|---|
To: | "Masahiko Sawada" <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | "jian he" <jian(dot)universality(at)gmail(dot)com>, "Tatsuo Ishii" <ishii(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: New "raw" COPY format |
Date: | 2024-11-02 11:08:31 |
Message-ID: | 39c01141-1f65-4e9d-82dd-976b4945a0f2@app.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Nov 1, 2024, at 22:28, Masahiko Sawada wrote:
> As I mentioned in a separate email, if we use the OS default EOL as
> the default EOL in raw format, it would not be necessary to allow it
> to be multi characters. I think it's worth considering it.
I like the idea, but not sure I understand how it would work.
What if a user's OS default is \n (LF) and this user wants
to import a Windows text file \r\n (CR LR), which is a
multi characters EOL delimiter.
Was your idea to make an exception for that particular EOL,
or to simply not support that edge case?
> ---
> * copyfromparse.c
> * Parse CSV/text/binary format for COPY FROM.
>
> We need to update here as well.
>
> --
> - * 4. CopyReadAttributesText/CSV() function takes the input line from
> + * 4. CopyReadAttributesText/CSV/Raw() function takes the input line from
>
> I think we don't have CopyReadAttributesRaw() function now.
>
> ---
> I think it would be better to explain how to parse data in raw mode,
> especially which steps in the pipeline we skip, in the comment at the
> top of copyfromparse.c.
>
> ---
> - if (cstate->opts.format == COPY_FORMAT_CSV)
> + if (cstate->opts.format == COPY_FORMAT_RAW)
> {
> - /*
> - * If character is '\r', we may need to look ahead below. Force
> - * fetch of the next character if we don't already have it. We
> - * need to do this before changing CSV state, in case '\r' is also
> - * the quote or escape character.
> - */
> - if (c == '\r')
> + if (delim_len == 0)
> {
> - IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
> + /* When reading entire file, consume all remaining
> bytes at once */
> + input_buf_ptr = copy_buf_len;
> + continue;
> }
> + else
>
> I think that the code become much simpler if we do 'continue' at the
> end of 'if (cstate->opts.format == COPY_FORMAT_RAW)' branch, instead
> of doing 'if (cstate->opts.format == COPY_FORMAT_RAW) {} else ...'.
>
> ---
> @@ -1461,6 +1536,7 @@ CopyReadLineText(CopyFromState cstate)
> return result;
> }
>
> +
> /*
> * Return decimal value for a hexadecimal digit
> */
> @@ -1937,7 +2013,6 @@ endfield:
> return fieldno;
> }
>
> -
> /*
> * Read a binary attribute
> */
>
> Unnecessary line addition and removal.
Thanks for review, I agree on all comments,
awaiting your comment on EOL before implementing changes.
/Joel
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuo Ishii | 2024-11-02 11:43:10 | Re: Fix for Extra Parenthesis in pgbench progress message |
Previous Message | Nitin Jadhav | 2024-11-02 11:04:27 | Re: Count and log pages set all-frozen by vacuum |