From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Chuck McDevitt <cmcdevitt(at)greenplum(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Optimizing COPY |
Date: | 2008-11-12 16:21:43 |
Message-ID: | 491B0297.5080903@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Chuck McDevitt wrote:
> What if the block of text is split in the middle of a multibyte character?
> I don't think it is safe to assume raw blocks always end on a character boundary.
Yeah, it's not. I realized myself after submitting. The generic approach
is to loop with pg_mblen() to find out the max. safe length. For UTF-8,
and probably many other multi-byte encodings as well, we can detect
whether a byte is the first byte of a multi-byte character, just by
looking at the few high-bits of the byte.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Huxton | 2008-11-12 16:29:34 | Re: [GENERAL] Very slow queries w/ NOT IN preparation (seems like a bug, test case) |
Previous Message | Tom Lane | 2008-11-12 16:21:35 | Re: libpq-events windows gotcha |