Re: One-off attempt at catalog hacking to turn bytea column into text

From: Noah Misch <noah(at)leadboat(dot)com>
To: Vlad Romascanu <vromascanu(at)accurev(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: One-off attempt at catalog hacking to turn bytea column into text
Date: 2011-05-11 08:11:08
Message-ID: 20110511081108.GA26552@tornado.gateway.2wire.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, May 10, 2011 at 04:31:37PM -0400, Vlad Romascanu wrote:
> As a one-off attempt to change a large table's 'bytea' column to
> 'text' with minimal I/O (where the 'bytea' contents is already valid
> UTF8 and the database encoding is also UTF8, and the column is not
> part of any index or anything involving collation), how unsafe is the
> following?
>
> UPDATE pg_attribute SET atttypid='text'::regtype::oid WHERE
> attrelid=('schema_name.table_name')::regclass AND attname='col_name'
> AND atttypid='bytea'::regtype::oid;

Do a "LOCK TABLE schema_name.table_name" earlier in the same transaction.

Any of the following potentially complicates things:
- data is not valid text in the server encoding, including NUL bytes
- use in an index (you mentioned this does not apply)
- use in a CHECK or FOREIGN KEY constraint
- default expression on the column
- views or rules referencing the column
- referenced directly in a trigger definition (UPDATE OF col_name, WHEN ...)

A useful, though not foolproof, way to check whether you've snagged on any of
those is to take schema dumps of a test database after (a) doing the change
normally and (b) doing it this way, then compare.

> Additionally, if the 'bytea' happenned to also explicitly contain a
> trailing NUL prior to the "conversion" (i.e. after the hack, the last
> byte in the 'text' value would be NUL), would there be any obvious
> problems with the above hack?

Yes; this will break things in the general case. Two text datums that differ
only in the presence of this trailing NUL will compare as unequal by texteq().
The octet_length() function will include the NUL byte in its count. If you use
the column in very restricted ways, you might end up okay. Note that you could
avoid these problems by updating the bytea columns to remove these NUL bytes
before making the catalog update. It could still be much I/O, but at least you
wouldn't have a table lock while doing it.

nm

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Stanislav Raskin 2011-05-11 11:19:38 full text search to_tsquery performance with ispell dictionary
Previous Message mephysto 2011-05-11 08:03:25 Custom types and JDBC