Re: Q: text query search and

From: Dane Foster <studdugie(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Q: text query search and
Date: 2015-07-23 18:02:04
Message-ID: CA+WxinKrdssdSWw-bGdA=_6vQ_Cj+6CTe694N_4171DcBTmpNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Kevin,

I am also interested in a deeper explanation of what you mean by, "The easy
way to do that is just to add a trigram index and search for similar
strings, and forget about full text search." Because I need to make a
decision about whether to use full text search or use other pattern
matching facilities such as LIKE and/or regular expressions. For me, the
reason I don't just default to full text search is the documents are
relative small (i.e, HTML <= 128K) and number fewer than 10,000 so I'm not
sure if the effort expended to learn the ins/outs of full text search will
be beneficial to my use case.

Regards,

Dane

On Thu, Jul 23, 2015 at 12:17 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Thu, Jul 23, 2015 at 2:55 AM, amihay gonen <agonenil(at)gmail(dot)com> wrote:
>
>> Hi I'm trying to implement a text search in PG .
>>
>> My goal to enable the user search on several columns also on partial
>> words.
>>
>> here is sample code :
>> create table test_test( text_data tsvector, text_a varchar,text_b
>> varchar);
>>
>> insert into test_test(text_a,text_b) select 'name 10.10.2.3 ','name3
>> name' from generate_series(1,500);
>> update test_test set text_data=to_tsvector(text_a||' '||text_b);
>> CREATE INDEX test_test_idx ON test_test USING gin(text_data);
>>
>> explain ANALYZE select * from test_test where text_data@
>> @plainto_tsquery('name');
>>
>> my questions are :
>> 1. why the index is not used (I guess it is related to the way the data
>> is generated)
>>
>
> It returns the entire table, so there is no point in using an index. Yes,
> it is the way it is generated, the same data repeated over and over is not
> very realistic. If you just want random text, I use md5(random()::text).
> But if you want text that looks vaguely like English, I don't have a nice
> trick for that. Maybe load the sgml files into a table.
>
>
>> 2, how can I use pg_trgm with ts_vector to enable to answer query like
>> 10.10 or nam ?
>> the idea is to use the gin index , maybe there are other option
>> without using pg_trgm?
>>
>
> Do you mean:
>
> WHERE text_a LIKE '%10.10%' or text_a LIKE '%nam%' ?
>
> With the or, that going to be hard to optimize.
>
> Anyway, pg_tgrm requires its own special GIN index, it can't piggy back on
> the tsvector GIN index.
>
> CREATE INDEX whatever ON test_test USING gin(text_a gin_trgm_ops);
>
> or
>
> CREATE INDEX whatever ON test_test USING gin((text_a||' '||text_b)
> gin_trgm_ops);
>
> But, LIKE '%10.10%' is going to be tough for a pg_trgm index to help with,
> unless you compile your own code after removing "#define KEEPONLYALNUM"
>
> Cheers,
>
> Jeff
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2015-07-23 18:02:06 Re: Using the database to validate data
Previous Message Jeff Janes 2015-07-23 16:17:33 Re: Q: text query search and