Re: BUG #18051: char(N) and varchar(N) behave opposite to the documentation

From: Nicolas Gouteux <nicolas(dot)gouteux(at)sonarsource(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #18051: char(N) and varchar(N) behave opposite to the documentation
Date: 2023-08-09 14:47:27
Message-ID: CANyQZ-60Y0wYMNTzSN6O=XkEVgrfvMmKAR7nDv=jgaF1apEa2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi David

Thanks a lot for the explanation.

Don't get me wrong, I almost never used the char datatype for my almost 40
years of DB programming.

Char is a reminiscence from the time when VAX and IBM machines had a single
mantra: fixed length data
The fact that white spaces are insignificant is all good. Nobody would like
to pad these days.
I hold for proof that client code is full of ltrim() rtrim() all the time...

I was just trying to show more junior programmers the difference between
the 'assumed' padded char and the more modern 'trimming' varchar
So imagine my surprise when both length() and concatenation() yielded the
exactly opposite behavior than expected!

I come from Sybase & Oracle, and I can tell you data length(char_type)
always return the length of the field.

I don't have any issue with this discrepancy with other vendors, but I
believe it's good to know (and maybe advertise in the docs?

Thanks again for taking the time to respond

Nicolas Gouteux | Sonar

https://sonarsource.com
Are you using SonarLint <https://www.sonarlint.org> in your IDE?

On Wed, 9 Aug 2023 at 16:38, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> On Thu, 10 Aug 2023 at 02:05, Nicolas Gouteux
> <nicolas(dot)gouteux(at)sonarsource(dot)com> wrote:
> > Sorry, but I still do not get it:
>
> I think the key is this sentence from the documentation:
>
> "Trailing spaces are removed when converting a character value to one
> of the other string types."
>
> Your concatenation example from earlier required that the type be
> implicitly cast to another type as we have no char-to-char
> concatenation operator, per:
>
> postgres=# select oprname,oprleft::regtype,oprright::regtype,oprcode
> from pg_operator where oprname = '||';
> oprname | oprleft | oprright | oprcode
> ---------+--------------------+--------------------+-----------------
> || | anycompatiblearray | anycompatible | array_append
> || | anycompatible | anycompatiblearray | array_prepend
> || | anycompatiblearray | anycompatiblearray | array_cat
> || | text | text | textcat
> || | bit varying | bit varying | bitcat
> || | bytea | bytea | byteacat
> || | text | anynonarray | textanycat
> || | anynonarray | text | anytextcat
> || | tsvector | tsvector | tsvector_concat
> || | tsquery | tsquery | tsquery_or
> || | jsonb | jsonb | jsonb_concat
> (11 rows)
>
> Concatenating two char(10)s would just use textcat(), so the above
> line from the doc applies since type conversion is required.
>
> > - char type adds padding up to its length
> > - select length(charcol), char_length(charcoal)
> > both yield 1
>
> There is a length function (bpcharlen) for char, per:
>
> postgres=# select proargtypes[0]::Regtype,prosrc from pg_proc where
> proname = 'length';
> proargtypes | prosrc
> -------------+--------------------
> text | textlen
> character | bpcharlen
> lseg | lseg_length
> path | path_length
> bytea | length_in_encoding
> bit | bitlength
> bytea | byteaoctetlen
> tsvector | tsvector_length
> (8 rows)
>
> However, that function does:
>
> /* get number of bytes, ignoring trailing spaces */
> len = bcTruelen(arg);
>
> and that wasn't by accident, per:
>
> commit f27976c85b1fb9002727cce65b9f9567e158f754
> Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Date: Sun Feb 1 06:27:48 2004 +0000
>
> Make length() disregard trailing spaces in char(n) values, per
> discussion
> some time ago and recent patch from Gavin Sherry. Update documentation
> to point out that trailing spaces are insignificant in char(n).
>
> unfortunately, we didn't link to discussions in commit messages back
> then. It might be worth you searching the archives shortly before
> that date to see if anything interesting comes up.
>
> As mentioned by David, there's not much call for using char(N) in
> PostgreSQL. I don't know the history, but I always imagined char(N)
> existence was owed to improved internal optimizations in RDBMS
> implementations that might have been possible if the tuples were
> fixed-sized. We have no such advantages in PostgreSQL as even a
> char(N) will be stored as a variable length field. In theory, we could
> take some advantage in that as tuple deformation becomes less
> efficient for columns that come after a variable length field due to
> the offset into the tuple not being fixed, however, because we store
> char(N)s as variable length, we can't take advantage of that and it's
> too late as changing it would change the binary format of the type.
>
> David
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Nicolas Gouteux 2023-08-09 14:49:17 Re: BUG #18051: char(N) and varchar(N) behave opposite to the documentation
Previous Message David Rowley 2023-08-09 14:43:34 Re: BUG #18051: char(N) and varchar(N) behave opposite to the documentation