Re: Camel case identifiers and folding

From: Morris de Oryx <morrisdeoryx(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Camel case identifiers and folding
Date: 2019-03-18 11:18:03
Message-ID: CAKqnccgk9V79jyGOV=0O4PEbx8r97OWMdRTsJTQXrwGe1haLVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Sounds like I may have touched a nerve with some. If so, no offense
intended!

There are cases where case-sensitivity is required or desirable, it would
be silly to argue otherwise. Where you have such cases, then case-sensitive
queries are great. Some RDBMS systems default to case-sensitive searches,
others default to case-blind searches. I'll take that to mean that either
choice has merit, and it's not foolish to prefer either. I need case-blind
searches vanishingly close to 100% of the time, but other people have
different conditions and may find that they almost never need case-blind
searches. I find this extreme hard to imagine, but find it easy to imagine
people who need case-sensitive searches quite often.

What I've been thinking about most are user-driven searches on
user-oriented data. Users, at least any user I've ever had, don't want
case-sensitive searches. They also don't care about diacritical characters,
at least in English. I worked for a French company for many years.
Diacritical searches were not always preferred, but sometimes were. It
depends on your user community and their requirements and norms. Sometimes
it comes down to an individual user. Options are good! I was really just
trying to warn someone coming from a base-blind default about Postgres
behavior because, well, it hurts if you aren't expecting it. That doesn't
make Postgres wrong (it's not that kind of a choice), but it is important
to know about.

I'm new to Postgres (only about a year in), and it's great. But I'm used to
a case-blind search as a default. And, honestly, I can *never* remember a
case when a user asked for a case-sensitive search. Ever. In 30+ years.
Maybe it's just me. Just kidding, it's not just me. If you're presenting
users with a search interface, you can find out by asking them. Or you can
AB test a search UI where there is the option of case-sensitive/blind
searching, but you randomly flip which is the default. For users,
case-sensitive searches are assumed. That's what Google does. Seriously,
Google === Search. It's not a hard test to run. If you find that with a
case-blind search, 30% of user tick the box to make it case-sensitive, then
you've got users that often do care about case-sensitive search.

And since it seems to be unclear at a few places in the discussion above: *It
absolutely makes sense to store data in its original form* and to allow for
case-sensitive searches, when required. It would be very weird to store

* Call me back Ishmael, I've gotta go...*

and get back anything else, be that

* call me back ishmael, i've gotta go...*

or

* CALL ME BACK ISHMAEL, I'VE GOTTA GO...*

As far as I understand it in Postgres, you can:

* Use something like UPPER or LOWER in Every. Single. Search.

* Fold text to one case consistently in an index to make searches
case-blind.

* Use citext.

* Teach and convince every user to enter and search for data
case-sensitively correctly Every. Single. Time.

On that last point, good luck. Here's an example, I'm keen on birds. Do you
write it:

Black-shouldered Kite
Black-Shouldered Kite

On Sun, Mar 17, 2019 at 8:46 AM Peter J. Holzer <hjp-pgsql(at)hjp(dot)at> wrote:

> On 2019-03-16 14:00:34 -0600, Rob Sargent wrote:
> > What sort of content is in your field of type text? Certainly,
> in
> > English
> > prose, “rob” is different than “Rob”
> >
> >
> > I disagree. While the grammar for written English has rules when to
> > write "rob" and when to write "Rob", that distinction usually
> carries no
> > semantic difference. Consider:
> [...]
> > I don’t think it’s solely about the semantics. One might be
> contractually
> > obligated to always spell a name in some exact way including it
> capitalization.
> > For instance if referring to "Rob Sargent” as a quote or accreditation,
> then
> > it’s not okay to let a typo “rob Sargent” go through.
>
> 1) Such contracts might exist, but they are only binding to the signing
> parties, they don't affect what is commonly understood as "the English
> language". Everybody else will see it as an obvious typo and won't
> assume that this refers to some "rob Sargent" who is a different person
> than "Rob Sargent".
>
> 2) I don't think the OP was talking about spell-checking. And in any
> case spell-checking is more complicated than simply comparing strings
> byte by byte.
>
> hp
>
>
> --
> _ | Peter J. Holzer | we build much bigger, better disasters now
> |_|_) | | because we have much more sophisticated
> | | | hjp(at)hjp(dot)at | management tools.
> __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
> -----BEGIN PGP SIGNATURE-----
>
> iQIzBAABCAAdFiEETtJbRjyPwVTYGJ5k8g5IURL+KF0FAlyNbrwACgkQ8g5IURL+
> KF2mKhAAq8b9RLFBihsO+dQf3pxjLt3bWoD9mhgMEy8GvxADuxAF4dLo1SqfX2S0
> aIZFEyQgMoNKWOaiylj7TYPiBRilfVSCxggisPsirUKLKpXr4Bw9oIGiPBiE+21m
> ajlaONOZNiaM9D+BFthFkPM0TcjR2FHTaXOch0HbFnPnDWMgEPwY9yyDeN8ZPCOn
> 7G002EB3wxHcnhoFm8jGO2E8SL9l0NLU6+CVlCPAenGI0H6gbFatHi1LKTmNe+PH
> d32Il7Pf4GE0o/k92X0Yil3oqMSU/vV08LSrMVGxUDVAz+cL0/W40V/3BthELi95
> l7kGJRJfgic7LU7IK84yszxR1ZJt1aBLyFIUpEoEsPl6XuSQsTOm/sFt/1C0L8Io
> iLYNTnLh2CveKt/bldITAAPlHZ1rF92BUsAJRA2Wci5yv5V7orykDPgE8HzhB5U9
> m1cHfjInqMi1ogGgSLw0sPH6ZDlT76p1H8RjjQQhD9eAFNYt29xFOUARyyWHVEut
> ZyWMIeHzWpLRr/h+5y6uJ+LhH4NVMONc3unE6fy1MgIZtIcmeFrbx8+7XGKmv/kD
> RY12aNOrCF85HdiIRd/ImXOsCWWiq7u7itZQrsqhQVRXNMnV/7M4Hlpe1rBeQ9V+
> k2FXuFM/eEtyEO+KKYwct+1iYerTTKY6hyCQwBINVewe/Fohd+A=
> =8B/x
> -----END PGP SIGNATURE-----
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Morris de Oryx 2019-03-18 11:20:11 Re: Camel case identifiers and folding
Previous Message Karsten Hilbert 2019-03-18 10:49:25 Re: s3 was: Where to store Blobs?