Re: VACUUM FULL

From: Brian McCane <bmccane(at)mccons(dot)net>
To: "Rajesh Kumar Mallah(dot)" <mallah(at)trade-india(dot)com>
Cc: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: VACUUM FULL
Date: 2002-05-11 16:53:38
Message-ID: 20020511044230.F99530-100000@fw.mccons.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin


I have tried contrib/tsearch, but I do fairly complex score computation
based on word placement (headers,body,graphics,metatags, etc). And also
do additional computations on the scores when the user does a search. It
is very quick to use something like tsearch to figure out which pages
match my criterion, but very slow to then compute a valid score for each
page. Instead, in an attempt to speed things up, a score is computed for
each distinct word on a page, and that score is stored in this table.
Then when a search is requested, I use previously existing search
information and data from this table to compute the new results.

The upshot of all this computing is that I get fairly relevant results,
but I have to do a lot of math to get there. I have been considering
combining the two methods, doing something like a full text search to find
pages that meet my criterion, and then using this table to actually
compute a pages score.

- brian

On Sat, 11 May 2002, Rajesh Kumar Mallah. wrote:

>
> Hi Brian ,
>
> are you performing full text search in any case?
>
> Apart from optimizing the TABLE/INDEXES (thru VACUUM measures)
>
> i feel using text indexes provided by contrib/tsearch can also
> lead to significant improvement in search performance.
>
> regds
> mallah.
>
> On Saturday 11 May 2002 05:48 am, Brian McCane wrote:
> > Okay, I guess I misunderstood something about "VACUUM FULL". My
> > understanding was that a VACUUM (without FULL), marked unused records for
> > reuse. VACUUM FULL moved records from the end of a file to the holes
> > where unused records were stored and truncated the file to free disk
> > space. So far so good, but....
> >
> > I have had continued loss of performance on one of my largest tables
> > (600,000,000 records). There are 4 fields in a record like such:
> >
> > CREATE TABLE foo (
> > a int not null references bar(a) on delete cascade on update no action,
> > b int not null references baz(b) on delete cascade on update no action,
> > c int,
> > d smallint,
> > primary key(a, b)) ;
> > CREATE INDEX foo_ac ON foo (a,c) ;
> > CREATE INDEX foo_ad on foo (a,d) ;
> > And there are 3 triggers which fire before insert/delete/update.
> >
> > I INSERT/UPDATE/DELETE approximately 300,000 records per day, but this
> > number is increasing on a daily basis as I make changes which improve the
> > performance of my data gathering tools (spiders ;). Two days ago, it had
> > reached the point where a search for a 3-word term (ie. free news servers)
> > took about 60 seconds. I have just spent 48 hours running a VACUUM FULL
> > on my table, and now the same search takes < 10 seconds. I assume that
> > the increase in performance is due to the decrease in table/index size
> > which added up to approximate 1GB of freed space on the machine, which was
> > approximately 4% of the original size of the table and all its indices.
> > But, a 4% decrease in size should not add up to a 84% increase in
> > performance (is that right? I always get the ratio confused :).
> >
> > If all that VACUUM FULL did was move records from file 12345678.6 to file
> > 12345678, the database would still being doing a large number of random
> > accesses on the table. However, if VACUUM FULL clusters the data
> > according to the primary key, it would still be doing a large number of
> > random access on the table, because the primary key has almost nothing to
> > do with how I actually access the data in real life. So, is VACUUM FULL
> > looking somewhere in pg_statistics (or pg_stat_user_indexes), to determine
> > which index I actually use most (foo_ad), and then clustering the data
> > that way, or is there some other agent at work here.
> >
> > - brian
> >
> >
> > Wm. Brian McCane | Life is full of doors that won't open
> > Search http://recall.maxbaud.net/ | when you knock, equally spaced amid
> > those Usenet http://freenews.maxbaud.net/ | that open when you don't want
> > them to. Auction http://www.sellit-here.com/ | - Roger Zelazny "Blood of
> > Amber"
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> > message can get through to the mailing list cleanly
>
> --
> Rajesh Kumar Mallah,
> Project Manager (Development)
> Infocom Network Limited, New Delhi
> phone: +91(11)6152172 (221) (L) ,9811255597 (M)
>
> Visit http://www.trade-india.com ,
> India's Leading B2B eMarketplace.
>
>

Wm. Brian McCane | Life is full of doors that won't open
Search http://recall.maxbaud.net/ | when you knock, equally spaced amid those
Usenet http://freenews.maxbaud.net/ | that open when you don't want them to.
Auction http://www.sellit-here.com/ | - Roger Zelazny "Blood of Amber"

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Brian McCane 2002-05-11 17:10:00 Re: VACUUM FULL
Previous Message Rajesh Kumar Mallah. 2002-05-11 05:27:50 Re: VACUUM FULL