Re: Performance impact of record sizes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: John Moore <postgres(at)tinyvital(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Postgresql Admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Performance impact of record sizes
Date: 2002-07-04 23:49:45
Message-ID: 6353.1025826585@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

John Moore <postgres(at)tinyvital(dot)com> writes:
> So I *suspect* I want to keep the data in the physical row, rather than
> using TEXT and having it stored separately from the record.

You seem to be reading something into the TEXT type that's not there;
perhaps you are carrying over associations from some other DBMS?
Here's how it works in Postgres:

The *only* performance difference between TEXT and VARCHAR(n) is that
for a VARCHAR(n) column, on insert or update there's an extra function
call applied to the new value to verify that its length is within N.
TEXT makes no such check; so TEXT has a small performance advantage if
there are lots of updates. On the SELECT side there's no difference.

Both TEXT and VARCHAR(n) data is stored using TOAST (automatic
compression or out-of-line storage) if the row gets too big. There
is no difference between the two datatypes on this score.

For the particular application you describe I think TOAST storage will
be ideal, since the overhead of compression or out-of-line storage is
only paid on the few rows where it's needed.

What you really ought to be asking yourself is whether (for this
application) a hard limit on the field width makes sense at all.
IMHO you should only use VARCHAR(n) if you can make a defensible
argument for the particular value of N you are using. If you can't
explain why you are setting the field width limit, then you probably
do not need a specific limit at all, and should be using TEXT.

> A related question: is it more expensive to use varchar than fixed char
> fields?

There is no savings from using CHAR(n) --- most of the time it's a
substantial loss, because of the extra I/O costs associated with the
space used by all those padding blanks. I can think of very very few
applications where CHAR(n) is really a sensible choice over VARCHAR(n).
US state postal codes (CHAR(2)) are an example, but beyond that the
valid use cases are mighty thin on the ground.

Bruce, it occurs to me that the "Performance Tips" section of the user's
guide ought to have an explanation of the performance implications of
TOAST. We have some material in the FAQ but there's precious little
mention in the manual. Thoughts?

regards, tom lane

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message John Moore 2002-07-04 23:56:30 Re: Performance impact of record sizes
Previous Message Bruce Momjian 2002-07-04 20:39:01 Re: Performance impact of record sizes