Quick Links

Working with huge amount of data.

From:	Mario Lopez <mario(at)lar3d(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Working with huge amount of data.
Date:	2008-02-11 15:37:24
Message-ID:	47B06BB4.4050906@lar3d.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hi guys :-), I am working on a personal project in which I am trying to
make sense on a huge (at least for me) amount of data. I have
approximately 150 million rows of unique words (they are not exactly
words it is just for explaining the situation).

The table I am inserting this is a quite simple table, something like this:

CREATE TABLE "public"."names" (
"id" SERIAL,
"name" VARCHAR(255)
) WITHOUT OIDS;

It is a requirement that I can make searches on the varchar with queries
that look the following way:

SELECT * FROM names WHERE name LIKE ‘keyword%’
Or
SELECT * FROM names WHERE name LIKE ‘%keyword%’

I optimized the first type of queries making partitions with every
letter that a name can begin with:

CREATE TABLE "public"."names_a" (
CONSTRAINT "names_a_check" CHECK (("name")::text ~~ 'a%'::text)
) INHERITS ("public"."names")
WITHOUT OIDS;

The problem arises with the second type of queries, where there are no
possible partitions and that the search keywords are not known, I have
tried making indexes on the letter it ends with, or indexes that specify
that it contains the letter specified but none of them work the
planifier only make sequential scans over the table.

For the moment the quickest scan I have being able to make is using
grep!!, surprisingly enough grep searches on an average of 20 seconds a
whole plain text file of 2 GB one name per line and PostgreSQL on the
fist type of queries takes like 50 seconds while the second type of
queries con take up to two minutes which is completely unacceptable for
an online search engine that has to attend a user querying this information.

How does this big search engines let’s say Google make this up? I am
amazed of the quickness on searching this amount of information in so
little time. Any approach I could take? I am open minded so anything is
acceptable not necessarily only PostgreSQL based solutions (although I
would prefer it). By the way Textual Search in PostgreSQL is discarded
because what I am looking at are not names that can be decomposed on
lexems, let's say that this varchar is composed of random garbage.

Thanks for the time taken to read. :-)

Responses

Re: Working with huge amount of data. at 2008-02-11 15:46:21 from Vivek Khera
Re: Working with huge amount of data. at 2008-02-11 15:51:59 from hubert depesz lubaczewski
Re: Working with huge amount of data. at 2008-02-11 16:07:24 from Erik Jones
Re: Working with huge amount of data. at 2008-02-11 16:43:43 from Jonathan Ballet
Re: Working with huge amount of data. at 2008-02-11 17:08:14 from Jeff

Browse pgsql-general by date

	From	Date	Subject
Next Message	Vivek Khera	2008-02-11 15:41:16	Re: Is PG a moving target?
Previous Message	Ivan Sergio Borgonovo	2008-02-11 15:33:54	Re: end of life for pg versions...