Re: Full Text Search dictionary issues

From: Steve Grey <stevegrey78(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Full Text Search dictionary issues
Date: 2010-07-16 15:02:27
Message-ID: AANLkTikddQPfCTCM7bqei38TC-GcB2Aj_qwN5Slc14ug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Does it run any differently if you split out the tag?

select count(*) from search_rm where to_tsvector('english', textsearch) @@
to_tsquery('english', 'wommmman & batt') and to_tsvector('english',
textsearch) @@ 'ftx1'::tsquery

Steve

On 16 July 2010 05:22, Howard Rogers <hjr(at)diznix(dot)com> wrote:

> On Fri, Jul 16, 2010 at 10:23 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Howard Rogers <hjr(at)diznix(dot)com> writes:
> >> I have 10 million rows in a table, with full text index created on one
> >> of the columns. I submit this query:
> >
> >> ims=# select count(*) from search_rm
> >> ims-# where to_tsvector('english', textsearch)
> >> ims-# @@ to_tsquery('english', 'woman & beach & ball');
> >> count
> >> -------
> >> 646
> >> (1 row)
> >> Time: 107.570 ms
> >
> >> ...and those are excellent times. But if I alter the query to read:
> >
> >> ims=# select count(*) from search_rm
> >> where to_tsvector('english', textsearch)
> >> @@ to_tsquery('english', 'woman & beach & ftx1');
> >> count
> >> -------
> >> 38343
> >> (1 row)
> >> Time: 640.985 ms
> >
> >> ...then, as you see, it slows the query down by a factor of about 6,
> >
> > ... um, but it increased the number of matching rows by a factor of
> > almost 60. I think your complaint of poor scaling is misplaced.
> >
> >> which is not so good! The problem is that we need to be able to search
> >> for "ftx1", since that's a flag we put in our document records to tell
> >> us the file type, and we need to be able to retrieve different file
> >> types at different times.
> >
> > You might want to rethink how you're doing that --- it seems like a file
> > type flag ought to be a separate column rather than a word in a text
> > field.
> >
> > regards, tom lane
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-general
> >
>
> OK, Tom: I did actually account for the number of rows difference
> before I posted, though I accept I didn't show you that. So here goes:
>
> ims=# select count(*)
> ims-# from search_rm
> ims-# where to_tsvector('english', textsearch) @@
> to_tsquery('english', 'wommmman & batt & ftxa')
> ims-# limit 20;
> count
> -------
> 0
> (1 row)
>
> Time: 0.593 ms
> ims=# select count(*)
> from search_rm
> where to_tsvector('english', textsearch) @@ to_tsquery('english',
> 'wommmman & batt & ftx1')
> limit 20;
> count
> -------
> 0
> (1 row)
>
> Time: 489.362 ms
>
> Both queries return zero rows. One takes an awful lot longer than the
> other. The only difference between them is that one searches for
> 'ftx1' and the other searches for 'ftx0'. My complaint of poor
> scalability (actually, it was an enquiry about the role of dictionary
> types!) is valid, I think. As a PostgreSQL newbie, I'm happy to accept
> that I've done something plonkingly stupid to account for these
> results, but I'd then like to know what it is I've done wrong! A
> simple scale-up of the number of hits isn't, however, the problem, I
> don't think.
>
> With this amount of data, and with 45 different document attributes
> that may or may not be searched for, some of them involving names and
> places and dates, some just yes/no flags, it is utterly impossible to
> have them as separate attribute columns and search on them with
> anything like decent performance. We adopted this approach with Oracle
> Text two years ago precisely because it was the only way to keep
> web-based searches of 10,000,000 records coming back in less than a
> second. So, no, we're not going to re-think the storage of 'attribute
> data' as part of the searchable keyword field, though I'm more than
> prepared to alter the precise format of that data if it helps
> PostgreSQL any.
>
> That said, however, we have people supplying us with document
> references in the form DA3-76374YY-001, so alpha-numerics simply have
> to be searchable with good speed, and I can't always magic-away the
> alpha-numeric components, even if I wanted to.
>
> So, I would still like to know if this performance difference when
> encountering alpha-numeric "words" is dictionary-related, and if so
> what I can do to fix that, please.
>
> Cheers,
> HJR
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Richard Huxton 2010-07-16 15:16:27 Re: Full Text Search dictionary issues
Previous Message Wappler, Robert 2010-07-16 14:34:22 Planner decisions