From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, Michael Fuhr <mike(at)fuhr(dot)org>, Mario Weilguni <mweilguni(at)sime(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Albe Laurenz <all(at)adv(dot)magwien(dot)gv(dot)at>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Bug in UTF8-Validation Code? |
Date: | 2007-03-17 20:28:53 |
Message-ID: | 45FC4F85.7090804@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tom Lane wrote:
> I wrote:
>
>> Actually, I have to take back that objection: on closer look, COPY
>> validates the data only once and does so before applying its own
>> backslash-escaping rules. So there is a risk in that path too.
>>
>
>
>> It's still pretty annoying to be validating the data twice in the
>> common case where no backslash reduction occurred, but I'm not sure
>> I see any good way to avoid it.
>>
>
> Further thought here: if we put encoding verification into textin()
> and related functions, could we *remove* it from COPY IN, in the common
> case where client and server encodings are the same? Currently, copy.c
> forces a trip through pg_client_to_server for multibyte encodings
> even when the encodings are the same, so as to perform validation.
> But I'm wondering whether we'd still need that. There's no risk of
> SQL injection in COPY data. Bogus input encoding could possibly
> make for confusion about where the field boundaries are, but bad
> data is bad data in any case.
>
> regards, tom lane
>
>
Here are some timing tests in 1m rows of random utf8 encoded 100 char
data. It doesn't look to me like the saving you're suggesting is worth
the trouble.
baseline:
Time: 28228.325 ms
Time: 25987.740 ms
Time: 25950.707 ms
Time: 25756.371 ms
Time: 27589.719 ms
Time: 25774.417 ms
after adding suggested extra test to textin():
Time: 26722.376 ms
Time: 28343.226 ms
Time: 26529.364 ms
Time: 28020.140 ms
Time: 24836.853 ms
Time: 24860.530 ms
Script is:
\timing
create table xyz (x text);
copy xyz from '/tmp/utf8.data';
truncate xyz;
copy xyz from '/tmp/utf8.data';
truncate xyz;
copy xyz from '/tmp/utf8.data';
truncate xyz;
copy xyz from '/tmp/utf8.data';
truncate xyz;
copy xyz from '/tmp/utf8.data';
truncate xyz;
copy xyz from '/tmp/utf8.data';
drop table xyz;
Test platform: FC6, Athlon64.
cheers
andrew
From | Date | Subject | |
---|---|---|---|
Next Message | Florian G. Pflug | 2007-03-17 20:44:29 | Re: Project suggestion: benchmark utility for PostgreSQL |
Previous Message | Hiroshi Saito | 2007-03-17 19:23:58 | Re: Bison 2.1 on win32 |