From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: tsearch2: language or encoding |
Date: | 2007-07-06 06:57:34 |
Message-ID: | 24929.1183705054@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> writes:
> I'm wondering if a tsearch's configuration is bound to a language or
> an encoding. If it's bound to a language, there's a serious design
> problem, I would think. An encoding or charset is not necessarily
> bound to single language. We can find such that example everywhere(I'm
> not talking about Unicode here). LATIN1 inclues English and several
> european languages. EUC-JP includes English and Japanese etc. And
> we specify encoding for char's property, not language, I would say the
> configuration should be bound to an encoding.
Surely not, because then what do you do with utf8, which (allegedly)
represents every language on earth?
As far as the word-stemming part goes, that is very clearly bound
to a language not an encoding. There may be some other parts of
the code that really are better attached to an encoding --- Oleg,
Teodor, your thoughts?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Smith | 2007-07-06 07:46:10 | Re: usleep feature for pgbench |
Previous Message | Tatsuo Ishii | 2007-07-06 06:43:38 | tsearch2: language or encoding |