| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> |
| Cc: | euler(at)timbira(dot)com, teodor(at)sigaev(dot)ru, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: [Fwd: Re: tsearch in core patch] |
| Date: | 2007-06-25 04:26:04 |
| Message-ID: | 12580.1182745564@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> writes:
> Ok, probably we need to copy the English stemming rule to the one for
> Japanese.
Pardon my ignorance here, but is the concept of stemming even relevant
to Japanese/Chinese/Korean? What little I know about ideographic
languages suggests it wouldn't work well. And surely the specific rules
in the Snowball project's English stemmer wouldn't work.
> I think same thing (commonly used English with local
> language) can be applied to Chinese and Korean.
Well, it's not hard at all to find chunks of English text that have
embedded bits of French, Spanish, or what-have-you, but that's not an
argument for trying to intermix the stemmers. I doubt that such simple
bits of program could tell the language difference well enough to
determine which stemming rules to apply.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tatsuo Ishii | 2007-06-25 04:40:59 | Re: [Fwd: Re: tsearch in core patch] |
| Previous Message | Tom Lane | 2007-06-25 04:10:03 | Re: Server-side support of all encodings |