Re: pg full text search very slow for Chinese characters

From: Andreas Joseph Krogh <andreas(at)visena(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: pg full text search very slow for Chinese characters
Date: 2019-09-10 16:42:26
Message-ID: VisenaEmail.3.8750116fce15432e.16d1c0b2b28@tc7-visena
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

På tirsdag 10. september 2019 kl. 18:21:45, skrev Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us
<mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>>: Jimmy Huang <jimmy_huang(at)live(dot)com> writes:
> I tried pg_trgm and my own customized token parser
https://github.com/huangjimmy/pg_cjk_parser

pg_trgm is going to be fairly useless for indexing text that's mostly
multibyte characters, since its unit of indexable data is just 3 bytes
(not characters). I don't know of any comparable issue in the core
tsvector logic, though. The numbers you're quoting do sound quite awful,
but I share Cory's suspicion that it's something about your setup rather
than an inherent Postgres issue.

regards, tom lane We experienced quite awful performance when we hosted the
DB on virtual servers (~5 years ago) and it turned out we hit the write-cache
limit (then 8GB), which resulted in ~1MB/s IO thruput. Running iozone might
help tracing down IO-problems. --
Andreas Joseph Krogh

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Albretch Mueller 2019-09-10 16:59:24 Re: kind of a bag of attributes in a DB . . .
Previous Message Tom Lane 2019-09-10 16:21:45 Re: pg full text search very slow for Chinese characters