Re: Working with huge amount of data. RESULTS!

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Mario Lopez <mario(at)lar3d(dot)com>
Cc: depesz(at)depesz(dot)com, pgsql-general(at)postgresql(dot)org
Subject: Re: Working with huge amount of data. RESULTS!
Date: 2008-02-12 15:15:49
Message-ID: Pine.LNX.4.64.0802121809410.23796@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, 12 Feb 2008, Mario Lopez wrote:

> Hi!,
>
> I optimized the LIKE 'keyword%' and LIKE '%keyword' with the following
> results:
>
> # time /Library/PostgreSQL8/bin/psql -U postgres -d testdb -c "select * from
> table1 where varchar_reverse(data) like varchar_reverse('%keyword');"
> real 0m0.055s
> user 0m0.011s
> sys 0m0.006s
>
>
> # time /Library/PostgreSQL8/bin/psql -U postgres -d testdb -c "select * from
> table1 where data like 'keyword%';"
> real 0m0.026s
> user 0m0.012s
> sys 0m0.006s
>
> It works flawlesly as you can see by the timings, take in consideration that
> "table1" has 100 million records. The only problem is generating the reversed
> index which takes like 20 minutes, I guess it has to do with the plperl
> function, perhaps a C function for inverting would make it up in less time.
>
> The problem is still with the LIKE '%keyword%', my problem is that I am not
> searching for Words in a dictionary fashion, suppose my "data" is random
> garbage, that it has common consecutive bytes. How could I generate a
> dictionary from this random garbage to make it easier for indexing?

suffix tree (array) would speedup '%keyword%' query, but currently it doesn't
supported by GiST extension architecture (we have it in our TODO), so I see
several ways (not tested):

1. try contrib/pg_trgm to reduce a number of candidate words
2. generate all possible substrings and use your tested approach

>
>> On Mon, Feb 11, 2008 at 04:37:24PM +0100, Mario Lopez wrote:
>>
>>> SELECT * FROM names WHERE name LIKE keyword%
>>> Or
>>> SELECT * FROM names WHERE name LIKE %keyword%
>>>
>>
>> check this:
>> http://www.depesz.com/index.php/2007/07/30/indexable-field-like-something/
>> and this:
>> http://www.depesz.com/index.php/2007/09/15/speeding-up-like-xxx/
>>
>> depesz
>>
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2008-02-12 15:26:06 Re: TSearch2 Migration Guide from 8.2 to 8.3
Previous Message Thomas Chille 2008-02-12 15:13:33 Some Autovacuum Questions