Re: finding bogus UTF-8

From: Geoffrey Myers <lists(at)serioustechnology(dot)com>
To: Vick Khera <vivek(at)khera(dot)org>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: finding bogus UTF-8
Date: 2011-02-15 22:06:07
Message-ID: 4D5AF8CF.9080001@serioustechnology.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Vick Khera wrote:
> On Tue, Feb 15, 2011 at 11:09 AM, Geoffrey Myers
> <lists(at)serioustechnology(dot)com> wrote:
>> comments would be appreciated.
>>
>
> If all you're doing is filtering stdin to stdout and deleting a range
> of characters, it seems that tr would be a faster tool:
>
> cat foo.txt | tr -d '\000-\008\013-\037\177-\377' > foo-cleaned.txt

I toyed with tr for a bit, but could not get it to work. The above did
not work for me either. Not exactly sure what it's doing, but here's a
couple of diff lines:

1619c1619
< days integer DEFAULT 28,
---
> days integer DEFAULT 2,

So it appears 'tr' is deleting the '8' character, rather then the octal
value for 008.

--
Until later, Geoffrey

"I predict future happiness for America if they can prevent
the government from wasting the labors of the people under
the pretense of taking care of them."
- Thomas Jefferson

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Merlin Moncure 2011-02-15 22:24:45 Re: SELECT INTO array[i] with PL/pgSQL
Previous Message Alpha Beta 2011-02-15 22:01:28 subset of attributes