Quick Links

Re: Size and performance hit from using UTF8 vs. ASCII?

From:	"Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>
To:	Ron <rjpeace(at)earthlink(dot)net>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Size and performance hit from using UTF8 vs. ASCII?
Date:	2006-02-08 15:54:01
Message-ID:	1139414042.8707.7.camel@localhost.localdomain
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Wed, 2006-02-08 at 09:11 -0500, Ron wrote:
> I'm specifically interested in the default C Locale; but if there's a
> difference in the answer for other locales, I'd like to hear about
> that as well.

The size hit will be effectively zero if your data is mainly of the
ASCII variety, since ASCII printable characters to UTF-8 is an identity
transform. However anything involving string comparisons, including
equality, similarity (LIKE, regular expressions), or any other kind of
comparison (ORDER BY, GROUP BY) will be slower. In my experience the
performance hit varies from zero to 100% in CPU time. UTF-8 is never
faster that ASCII, as far as I know.

However, if you need UTF-8 then you need it, and there's no point in
worrying about the performance hit.

You may as well just do two benchmark runs with your database
initialized in either character set to see for yourself.

-jwb

In response to

Size and performance hit from using UTF8 vs. ASCII? at 2006-02-08 14:11:11 from Ron

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Ron	2006-02-08 22:05:02	Sane configuration options for a WinXP laptop 8.1 install?
Previous Message	Stephan Szabo	2006-02-08 15:46:39	Re: optimizing away join when querying view