Identifier case folding notes

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Identifier case folding notes
Date: 2008-07-08 17:25:38
Message-ID: 200807081925.40467.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have had some idle thoughts on the issue of identifier case folding. Maybe
we can collect our ideas and inch closer to a solution sometime. Or we
determine that it's useless and impossible, but then I can at least collect
that result in the wiki and point future users to it.

Background: PostgreSQL folds unquoted identifiers to lower case. SQL
specifies folding to upper case, and most other SQL DBMS do it that way. If
an application mixes referring to an object using quoted and unquoted
versions of an identifier, then incompatibilities arise. We have always
stressed to users that one should refer to an object always unquoted or
always quoted. While that remains a good suggestion for a number of reasons,
we have seen occasional complaints that some closed source applications
violate this rule and therefore cannot be run on PostgreSQL.

A number of solutions have been proposed over time, which I summarize here:

1. Change the lexer to fold to upper case, as it is supposed to do according
to the SQL standard. This will break almost everything, because almost all
built-in objects have lower-case names and thus couldn't be referred to
anymore except by quoting. Changing the names of all the internal objects to
upper-case names would involve vast code changes, probably break just as much,
and make everything uglier. So this approach is unworkable.

2. Fold to upper case, but not when referring built-in objects. The lexer
generally doesn't know what a name will refer to, so this is not possible to
implement, at least without resorting to lots of hard-coding or horrid
kludges. Also, a behavior like this will probably create all kinds of weird
inconsistencies, resulting from putting catalog knowledge in the lexer.

3 and 4 -- Two variants of "ignore case altogether":

3. Fold all identifiers to lower case, even quoted ones. This would probably
in fact fix the breakage of many of the above-mentioned problem applications,
and it would in general be very easy to understand for a user. And it could
be implemented in about three lines. One disadvantage is that one could no
longer have objects that have names different only by case, but that is
probably rare and incredibly stupid and can be neglected. The main
disadvantage is that the case of identifiers and in particular column labels
is lost. So applications and programming interfaces that look up result
columns in a case-sensitive manner would fail. And things like SELECT expr AS
"Nice Heading" won't work properly anymore.

4. Compare the "name" data type in a case-insensitive manner. This would
probably address most problem cases. Again, you can't have objects with names
different in case only. One condition to implementing this would be that this
behavior would have be tied down globally at initdb, because it affects system
indexes and shared catalogs. That might be impractical for some, because
you'd need different instances for different behaviors, especially when you
want to host multiple applications or want to port an affected application to
the native PostgreSQL behavior over time.

5. One additional approach I thought of is that you swap the case of
identifiers as you lex them (upper to lower, lower to upper), and then swap
them back when you send them to the client. This needs a small change in the
lexer, one for sending the RowDescription, and support in pg_dump and a few
other places if desired. There will, however, be a number of weird, albeit
self-imposed, side-effects. I have implemented a little test patch for this.
It's weird, but it works in basic ways.

Obviously, no solution will ever work completely. And we probably don't want
such a solution, because it would create two different and incompatible
PostgreSQL universes. If we are aiming for a solution that would allow most
affected applications to hobble along, we would probably serve most users.
Implementing some or all of 3, 4, and 5 would probably achieve that.

Comments?

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2008-07-08 17:34:01 Re: [PATCHES] Solaris ident authentication using unix domain sockets
Previous Message Josh Berkus 2008-07-08 16:44:57 Re: Proposal of SE-PostgreSQL patches [try#2]