From: | Arjen van der Meijden <acmmailing(at)vulcanus(dot)its(dot)tudelft(dot)nl> |
---|---|
To: | "Marc G(dot) Fournier" <scrappy(at)postgresql(dot)org> |
Cc: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Mnogosearch (Was: Re: website doc search is ... ) |
Date: | 2004-01-01 19:14:49 |
Message-ID: | 3FF471A9.8010806@vulcanus.its.tudelft.nl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Marc G. Fournier wrote:
> On Thu, 1 Jan 2004, Arjen van der Meijden wrote:
>
>
>>Marc G. Fournier wrote:
>>
>>>Now, if I knock off the LIKE, so that I'm returning all rows from ndict8,
>>>join'd to all the URLs that contain them, you get:
>>
>>Can't you build seperate databases for each domain you want to index?
>>Than you wouldn't need the like operator at all.
>
>
> First off, that would make searching across multiple domains difficult,
> no?
If mnogosearch would allow searching in multiple databases; no. But it
doesn't seem to feature that and indeed; yes that might become a bit
difficult.
It was something I thought of because our solution allows it, but that
is no solution for you, I checked the mnogosearch features after sending
that email, instead of before. Perhaps I should've turned that around.
> Second, the LIKE is still required ... the LIKE allows the search to
> "group" URLs ... for instance, if I wanted to just search on the docs, the
> LIKE would look for all URLs that contain:
>
> http://www.postgresql.org/docs/%%
>
> whereas searching the whole site would be:
>
> http://www.postgresql.org/%%
That depends. If it were possible, you could decide from the search
usage stats to split /docs from the "the rest" of www.postgresql.org and
by that avoiding quite a bit of like's.
>>Anyway, that doesn't help you much, perhaps decreasing the size of the
>>index-tables can help, are they with OIDs ? If so, wouldn't it help to
>>recreate them without, so you save yourselves 4 bytes per word-document
>>couple, therefore allowing it to fit in less pages and by that speeding
>>up the seqscans.
>
>
> This one I hadn't thought about ... for some reason, I thought that
> WITHOUT OIDs was now the default ... looking at that one now ...
No, it's still the default to do it with oids.
>>By the way, can a construction like (tablefield || '') ever use an index
>>in postgresql?
>
>
> again, as shown in a previous email, the index is being used for the LIKE
> query ... the big problem as I see it is that the result set from the LIKE
> is ~20x larger then the result set for the the = ... if there was some way
> to telling the planner that going the LIKE route was the more expensive of
> the two (even though table size seems to indicate the other way around), I
> suspect that that would improve things also ...
Yeah, I noticed. Hopefully Tom's suggestion will work to achieve that.
I can imagine how you feel about all this, I had to do a similar job a
year ago, but was less restricted by a preference like the "it'd be a
nice postgresql showcase". But then again, our search engine is loaded
with an average of 24 queries per minute (peaking to over 100/m in the
afternoon and evenings) and we didn't have any working solution (not
even a slow one).
Good luck,
Arjen van der Meijden
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2004-01-01 19:17:50 | Re: Mnogosearch (Was: Re: website doc search is ... ) |
Previous Message | Marc G. Fournier | 2004-01-01 19:10:00 | Mnogosearch: Comparing PgSQL 7.4 to MySQL 4.1 |