Re: Camel case identifiers and folding

From: Morris de Oryx <morrisdeoryx(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Camel case identifiers and folding
Date: 2019-03-18 11:20:11
Message-ID: CAKqncch3cou_omMDe6uehVG+aReoWmTpSB5gg9KF8dAjz58zHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

...hit send by accident

..or BLACK SHOULDERED KITE

You'll see all three regularly, depending on context and writer. It's
always the same bird. A natural search turns up all matches, regardless of
capitalization differences. The hyphens are hard....I'm a huge, huge, huge
fan of fuzzy string matching and n-grams, but that's an unrelated topic.

On Mon, Mar 18, 2019 at 10:18 PM Morris de Oryx <morrisdeoryx(at)gmail(dot)com>
wrote:

> Sounds like I may have touched a nerve with some. If so, no offense
> intended!
>
> There are cases where case-sensitivity is required or desirable, it would
> be silly to argue otherwise. Where you have such cases, then case-sensitive
> queries are great. Some RDBMS systems default to case-sensitive searches,
> others default to case-blind searches. I'll take that to mean that either
> choice has merit, and it's not foolish to prefer either. I need case-blind
> searches vanishingly close to 100% of the time, but other people have
> different conditions and may find that they almost never need case-blind
> searches. I find this extreme hard to imagine, but find it easy to imagine
> people who need case-sensitive searches quite often.
>
> What I've been thinking about most are user-driven searches on
> user-oriented data. Users, at least any user I've ever had, don't want
> case-sensitive searches. They also don't care about diacritical characters,
> at least in English. I worked for a French company for many years.
> Diacritical searches were not always preferred, but sometimes were. It
> depends on your user community and their requirements and norms. Sometimes
> it comes down to an individual user. Options are good! I was really just
> trying to warn someone coming from a base-blind default about Postgres
> behavior because, well, it hurts if you aren't expecting it. That doesn't
> make Postgres wrong (it's not that kind of a choice), but it is important
> to know about.
>
> I'm new to Postgres (only about a year in), and it's great. But I'm used
> to a case-blind search as a default. And, honestly, I can *never* remember
> a case when a user asked for a case-sensitive search. Ever. In 30+ years.
> Maybe it's just me. Just kidding, it's not just me. If you're presenting
> users with a search interface, you can find out by asking them. Or you can
> AB test a search UI where there is the option of case-sensitive/blind
> searching, but you randomly flip which is the default. For users,
> case-sensitive searches are assumed. That's what Google does. Seriously,
> Google === Search. It's not a hard test to run. If you find that with a
> case-blind search, 30% of user tick the box to make it case-sensitive, then
> you've got users that often do care about case-sensitive search.
>
> And since it seems to be unclear at a few places in the discussion above: *It
> absolutely makes sense to store data in its original form* and to allow
> for case-sensitive searches, when required. It would be very weird to store
>
> * Call me back Ishmael, I've gotta go...*
>
> and get back anything else, be that
>
>
> * call me back ishmael, i've gotta go...*
>
> or
>
> * CALL ME BACK ISHMAEL, I'VE GOTTA GO...*
>
> As far as I understand it in Postgres, you can:
>
> * Use something like UPPER or LOWER in Every. Single. Search.
>
> * Fold text to one case consistently in an index to make searches
> case-blind.
>
> * Use citext.
>
> * Teach and convince every user to enter and search for data
> case-sensitively correctly Every. Single. Time.
>
> On that last point, good luck. Here's an example, I'm keen on birds. Do
> you write it:
>
> Black-shouldered Kite
> Black-Shouldered Kite
>
>
> On Sun, Mar 17, 2019 at 8:46 AM Peter J. Holzer <hjp-pgsql(at)hjp(dot)at> wrote:
>
>> On 2019-03-16 14:00:34 -0600, Rob Sargent wrote:
>> > What sort of content is in your field of type text? Certainly,
>> in
>> > English
>> > prose, “rob” is different than “Rob”
>> >
>> >
>> > I disagree. While the grammar for written English has rules when to
>> > write "rob" and when to write "Rob", that distinction usually
>> carries no
>> > semantic difference. Consider:
>> [...]
>> > I don’t think it’s solely about the semantics. One might be
>> contractually
>> > obligated to always spell a name in some exact way including it
>> capitalization.
>> > For instance if referring to "Rob Sargent” as a quote or accreditation,
>> then
>> > it’s not okay to let a typo “rob Sargent” go through.
>>
>> 1) Such contracts might exist, but they are only binding to the signing
>> parties, they don't affect what is commonly understood as "the English
>> language". Everybody else will see it as an obvious typo and won't
>> assume that this refers to some "rob Sargent" who is a different person
>> than "Rob Sargent".
>>
>> 2) I don't think the OP was talking about spell-checking. And in any
>> case spell-checking is more complicated than simply comparing strings
>> byte by byte.
>>
>> hp
>>
>>
>> --
>> _ | Peter J. Holzer | we build much bigger, better disasters now
>> |_|_) | | because we have much more sophisticated
>> | | | hjp(at)hjp(dot)at | management tools.
>> __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
>> -----BEGIN PGP SIGNATURE-----
>>
>> iQIzBAABCAAdFiEETtJbRjyPwVTYGJ5k8g5IURL+KF0FAlyNbrwACgkQ8g5IURL+
>> KF2mKhAAq8b9RLFBihsO+dQf3pxjLt3bWoD9mhgMEy8GvxADuxAF4dLo1SqfX2S0
>> aIZFEyQgMoNKWOaiylj7TYPiBRilfVSCxggisPsirUKLKpXr4Bw9oIGiPBiE+21m
>> ajlaONOZNiaM9D+BFthFkPM0TcjR2FHTaXOch0HbFnPnDWMgEPwY9yyDeN8ZPCOn
>> 7G002EB3wxHcnhoFm8jGO2E8SL9l0NLU6+CVlCPAenGI0H6gbFatHi1LKTmNe+PH
>> d32Il7Pf4GE0o/k92X0Yil3oqMSU/vV08LSrMVGxUDVAz+cL0/W40V/3BthELi95
>> l7kGJRJfgic7LU7IK84yszxR1ZJt1aBLyFIUpEoEsPl6XuSQsTOm/sFt/1C0L8Io
>> iLYNTnLh2CveKt/bldITAAPlHZ1rF92BUsAJRA2Wci5yv5V7orykDPgE8HzhB5U9
>> m1cHfjInqMi1ogGgSLw0sPH6ZDlT76p1H8RjjQQhD9eAFNYt29xFOUARyyWHVEut
>> ZyWMIeHzWpLRr/h+5y6uJ+LhH4NVMONc3unE6fy1MgIZtIcmeFrbx8+7XGKmv/kD
>> RY12aNOrCF85HdiIRd/ImXOsCWWiq7u7itZQrsqhQVRXNMnV/7M4Hlpe1rBeQ9V+
>> k2FXuFM/eEtyEO+KKYwct+1iYerTTKY6hyCQwBINVewe/Fohd+A=
>> =8B/x
>> -----END PGP SIGNATURE-----
>>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Chris Travers 2019-03-18 12:16:15 Re: Camel case identifiers and folding
Previous Message Morris de Oryx 2019-03-18 11:18:03 Re: Camel case identifiers and folding