From: | Klint Gore <kgore4(at)une(dot)edu(dot)au> |
---|---|
To: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
Cc: | Merlin Moncure <mmoncure(at)gmail(dot)com>, Sim Zacks <sim(at)compulab(dot)co(dot)il>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: bytea encode performance issues |
Date: | 2008-08-08 00:28:43 |
Message-ID: | 489B933B.4050000@une.edu.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Alvaro Herrera wrote:
> Merlin Moncure escribió:
>
> > er, I see the problem (single piece of text with multiple encodings
> > inside) :-). ok, it's more complicated than I thought. still, you
> > need to convert the email to utf8. There simply must be a way,
> > otherwise your emails are not well defined. This is a client side
> > problem...if you push it to the server in ascii, you can't use any
> > server side text operations reliably.
>
> I think the solution is to get the encoding from the email header and
> the set the client_encoding to that. However, as soon as an email with
> an unsopported encoding comes by, you are stuck.
>
Why not leave it as bytea? The postgres server has no encoding problems
with storing whatever you want to throw at it, postgres client has no
problem reading it back. It's then up to the imap/pop3/whatever client
to deal with it. That's normally the way the email server world works.
FWIW the RFC for email (822/2822) says it is all ASCII so it's not a
problem at all as long as every email generator follows the IETF rules
(body here is not just the text of the message - its the data after the
blank line in the SMTP conversation until the CRLF.CRLF).
"2.3. Body
The body of a message is simply lines of US-ASCII characters. "
The 2 things that will make a difference to the query is 1. get rid of
the encode call and 2. stop it being toasted
Assuming that the dbmail code can't be changed yet
1. make encode a no-op.
- create schema foo;
- create function foo.encode (bytea,text) returns bytea as $$ select
$1 $$ language sql immutable;
- change postgresql.conf search_path to foo,pg_catalog,....
This completly breaks encode so if anything uses it properly then it's
broken that. From the query we've seen, we don't know if it's needed or
not. What query do you get if you search for something that has utf or
other encoding non-ASCII characters? If it looks like the output of
escape (i.e. client used PQescapeByteaConn on the search text), then the
escape might be required.
2. dbmail already chunks email up into ~500k blocks. If that is a
configurable setting, turn it down to about 1.5k blocks.
klint.
--
Klint Gore
Database Manager
Sheep CRC
A.G.B.U.
University of New England
Armidale NSW 2350
Ph: 02 6773 3789
Fax: 02 6773 3266
EMail: kgore4(at)une(dot)edu(dot)au
From | Date | Subject | |
---|---|---|---|
Next Message | Steve Atkins | 2008-08-08 01:08:49 | Re: bytea encode performance issues |
Previous Message | Tom Lane | 2008-08-08 00:10:46 | Re: Floating-point software assist fault? |