LIKE, CHAR(), and trailing spaces

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: LIKE, CHAR(), and trailing spaces
Date: 2011-02-02 23:54:42
Message-ID: 201102022354.p12NsgR06340@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I found a little LIKE/CHAR() surprise --- below is a table and query
against a CHAR(10) field:

test=> CREATE TABLE test (x char(10));
CREATE TABLE

test=> INSERT INTO test values ('hi');
INSERT 0 1

test=> SELECT * FROM test WHERE x = 'hi';
x
------------
hi
(1 row)

The above works because both sides are converted to 'bpchar'; explain
shows that:

test=> EXPLAIN SELECT * FROM test WHERE x = 'hi';
QUERY PLAN
------------------------------------------------------
Seq Scan on test (cost=0.00..33.12 rows=9 width=14)
Filter: (x = 'hi'::bpchar)
^^^^^^
(2 rows)

The following does not work:

test=> SELECT * FROM test WHERE x LIKE 'hi';
x
---
(0 rows)

It seems LIKE is considering the trailing CHAR(10) field spaces as
significant, even though our documentations says:

Values of type <type>character</type> are physically padded
with spaces to the specified width <replaceable>n</>, and are
stored and displayed that way. However, the padding spaces are
treated as semantically insignificant. Trailing spaces are
--> disregarded when comparing two values of type <type>character</type>,
and they will be removed when converting a <type>character</type> value
to one of the other string types. Note that trailing spaces
<emphasis>are</> semantically significant in
<type>character varying</type> and <type>text</type> values.

It says trailing spaces are not significant for character comparisons
--- the real question is whether LIKE is a comparison. Obvioiusly '='
is a comparison, but the system does not treat LIKE as a comparison in
terms of trailing spaces. Is that desired behavior?

I did an EXPLAIN on the query and found '~~' was being used and 'hi' was
being converted to text:

test=> explain select * from test where x like 'hi';
QUERY PLAN
------------------------------------------------------
Seq Scan on test (cost=0.00..33.12 rows=9 width=14)
Filter: (x ~~ 'hi'::text)
^^ ^^^^
(2 rows)

so I then checked psql \do to see what operators there were for ~~:

test=> \do ~~
List of operators
Schema | Name | Left arg type | Right arg type | Result type | Description
------------+------+---------------+----------------+-------------+-------------------------
pg_catalog | ~~ | bytea | bytea | boolean | matches LIKE expression
--> pg_catalog | ~~ | character | text | boolean | matches LIKE expression
pg_catalog | ~~ | name | text | boolean | matches LIKE expression
pg_catalog | ~~ | text | text | boolean | matches LIKE expression
(4 rows)

The one marked matches the arguments so it seems the comparison being
done is not character and character, but character and text.

I realize trim() could be used to get the desired behavior, but is our
behavior consistent?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2011-02-03 00:03:06 Re: [HACKERS] Slow count(*) again...
Previous Message Tom Lane 2011-02-02 23:42:56 Re: ALTER EXTENSION UPGRADE, v3