Re: Case sensitivity

From: "Frank Millman" <frank(at)chagford(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Case sensitivity
Date: 2005-08-09 07:35:25
Message-ID: VPOP32.1.0e.20050809093619.625.7.1.a40c1f81@chagford.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Frank Millman wrote:

> Hi all
>
> Is there an LC_COLLATE setting, or any other method, which allows all
> data in a database to be treated in a case-insensitive manner?

I was hoping to stimulate some discussion on this topic, but it seems I will
have to kick-start it myself and see if anyone responds.

My area of interest is general accounting/business systems, using a typical
Western character set. I would imagine that this is a common scenario, but
it is not universal, so read my comments in this context.

In the good old days, data entry was always done in upper case, by dedicated
data entry personnel. These days, it is typically done by a wide variety of
individuals, who carry out a wide range of tasks, most of which require
lower case (word processing, email) with the occasional use of the shift key
to enter a capital letter.

In this context, here are two undesirable effects.

1. The user tries to call up account 'A001', but they enter 'a001'. First
problem, the system does not find the account. Second problem, the system
allows them to create a new account with the code 'a001'. Now you have
'A001' and 'a001'. This is a recipe for chaos.

2. The user tries to call up a product item using a search string on the
description. Assume they enter 'Wheel nut'. Assume further that the person
who created the product item used the description 'Wheel Nut'. Try
explaining to the user why the system cannot find the item they are looking
for.

I mentioned in my original post that there are workarounds for these
problems. However, it seems to me that in a typical system you would want to
apply the workaround on every table, and therefore there is a case for
saying that the database should handle it.

I have some experience of two other database systems, and it is of interest
to see how they handle it.

1. D3 (the latest implementation of the old Pick Database System). In the
early days it was case sensitive. When they brought out a new version in the
early 90's they changed it to case insensitive. As you would expect, an
upgrade required a full backup and restore. I was involved in many of these,
some of them quite large. On two occasions I found that accounts were out of
balance after the restore, and on investigation found that situations
similar to 'A001' 'a001' had crept into the old database, and on restore the
second insert was rejected as the first one already existed. When this was
explained to the user, the reaction was always concern that this 'error' had
been allowed to happen, and relief that the new version ensured that it
could never happen again.

2. MSSQL Server 2000. Each time you create a new database you have to
specify which 'collation' to use. There is a wide range available, both case
sensitive and case insensitive. The default (on my system at least) is case
insensitive, and I believe that in practice this is what most people want.

There may well be counter-arguments to this being handled by the database,
and I would be interested to hear them. However, from my point of view, if
this capability is not currently available in PostgreSQL, I would like to
propose that it is considered for some future release.

Looking forward to any comments.

Frank Millman

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Sander Steffann 2005-08-09 08:00:08 Re: [Pgsqlrpms-hackers] Re: AMD 64 RPM?
Previous Message Oleg Bartunov 2005-08-09 07:03:37 Re: Adjacency List & total item counts