Re: pg full text search very slow for Chinese characters

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jimmy Huang <jimmy_huang(at)live(dot)com>
Cc: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: pg full text search very slow for Chinese characters
Date: 2019-09-10 16:21:45
Message-ID: 2533.1568132505@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Jimmy Huang <jimmy_huang(at)live(dot)com> writes:
> I tried pg_trgm and my own customized token parser https://github.com/huangjimmy/pg_cjk_parser

pg_trgm is going to be fairly useless for indexing text that's mostly
multibyte characters, since its unit of indexable data is just 3 bytes
(not characters). I don't know of any comparable issue in the core
tsvector logic, though. The numbers you're quoting do sound quite awful,
but I share Cory's suspicion that it's something about your setup rather
than an inherent Postgres issue.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andreas Joseph Krogh 2019-09-10 16:42:26 Re: pg full text search very slow for Chinese characters
Previous Message Jimmy Huang 2019-09-10 16:20:57 Re: pg full text search very slow for Chinese characters