Re: PATCH: CITEXT 2.0 v2

From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PATCH: CITEXT 2.0 v2
Date: 2008-07-07 17:20:03
Message-ID: 2FB38675-96B8-4AF6-88F0-9EE271016A38@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jul 7, 2008, at 07:41, Zdenek Kotala wrote:

> However, It seems to me that code is ok now (exclude citex_eq). I
> think there two open problems/questions:
>
> 1) regression test -
>
> a) I think that regresion is not correct. It depends on en_US
> locale, but it uses characters which is not in related character
> repertoire. It means comparing is not defined and I guess it could
> generate different result on different OS - depends on locale
> implementation.

That I don't know about. The test requires en_US.UTF-8, at least at
this point. How are tests run on the build farm? And how else could I
ensure that comparisons are case-insensitive for non-ASCII characters
other than requiring a Unicode locale? Or is it just an issue for the
sort order tests? For those, I could potentially remove accented
characters, just as long as I'm verifying in other tests that
comparisons are case-insensitive (without worrying about collation).

> b) pgTap is something new. Need make a decision if this framework
> is acceptable or not.

Well, from the point of view of `make installcheck`, it's invisible.
I've submitted a talk proposal for pgDay.US on ptTAP. I'm happy to
discuss it further here though, if folks are interested.

> 2) contrib vs. pgFoundry
>
> There is unresolved answer if we want to have this in contrib or
> not. Good to mention that citext type will be obsoleted with full
> collation implementation in a future. I personally prefer to keep it
> on pgFoundry because it is temporally workaround (by my opinion),
> but I can live with contrib module as well.

I second what Andrew has said in reply to this issue. I'll also say
that, since people *so* often end up using `WHERE LOWER(col) =
LOWER(?)`, that it'd be very valuable to have citext in contrib,
especially since so few folks seem to even know about pgFoundry, let
alone be able to find it. I mean, look at these search results:

http://www.google.com/search?q=PostgreSQL%20case-insensitive%20text

My blog entry about this patch is hit #3. pgFoundry (and CITEXT 1) is
#7. Last time I did a query like this, it didn't turn up at all.

Belive me, I'd love for pgFoundry (or something like it) to become the
CPAN for PostgreSQL. But without some love and SEO, I don't think
that's gonna happen.

Besides, CITEXT 2 would be a PITA to maintain for both 8.3 and 8.4,
given the changes in the string comparison API in CVS.

Thanks,

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2008-07-07 17:20:57 Re: PATCH: CITEXT 2.0 v2
Previous Message David E. Wheeler 2008-07-07 17:09:44 Re: PATCH: CITEXT 2.0