Re: tesearch2 question

From: Sumeet <asumeet(at)gmail(dot)com>
To: "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: tesearch2 question
Date: 2007-03-07 20:40:34
Message-ID: 7539aebb0703071240l6c9c091eq163b3df16040a06f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Thanks Oleg,

My String message are Abstracts of papers, I did a

$ select avg(len) from (select length(abstract) as len from master_table
limit 500) E;

avg
-----------------------
1355.5907859078590786
(1 row)

so length is approx 1400.

I couldn't find any appropriate way to analyze the time for update queries,
but what i did was a explain analyze

$ explain analyze select to_tsvector(article_title) from master_table limit
1000;

The total runtime was approx 500ms.

The server is Sun OS 5.10, with around 8gigs of RAM and 6cpus.

Thanks,
Sumeet.

On 3/7/07, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
>
> On Wed, 7 Mar 2007, Sumeet wrote:
>
> > Hi All,
> >
> > I'm trying to udpate a table containing 13149741 records. And its
> taking
> > forever to complete this process.
> >
> > The update query i'm trying to run is for full text indexing similiar to
> >
> > UPDATE tblMessages SET idxFTI=to_tsvector(strMessage);
> >
>
> How big are your strMessage ? and what's your tsearch2 configuration ?
> Can you estimate how long takes updating, for example, 1000 rows ?
> It looks like your system is IO bound. What's your hardware ?
>
> >
> > Below are some of the stats which might be helpful for analyzing this
> >
> > $top
> >
> > PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
> > 3091 postgres 1 43 0 46M 38M cpu/1 200:06 3.20% postgres
> > 5052 postgres 1 60 0 149M 134M sleep 0:17 3.12% postgres
> >
> > <<<here are the top 2 processes, out of which the first process i have
> been
> > running almost for a day and a half and it is still running,
> >
> > This table which i'm trying to update has 10 indexes
> >
> > =========================================================
> > "a_article_pk" PRIMARY KEY, btree (id)
> > "a_article_uk_pmid" UNIQUE, btree (pmid)
> > "a_article_idx_abstract" btree ("substring"(abstract::text, 0, 255))
> > "a_article_idx_date_cr_year" btree (date_cr_year)
> > "a_article_idx_ml_journal_info_medline_ta" btree
> (ml_journal_info_a_ta)
> > "a_article_idx_owner" btree ("owner")
> > "a_article_idx_pmid" btree (pmid)
> > "a_article_idx_status" btree (status)
> > "a_article_idx_title" btree (article_title)
> > "a_master_t_idx_year_published" btree (published_year)
> > ========================================================
> > But no indexes on the field i'm trying to update. The field i'm trying
> to
> > add is a new field.
> > Can anyone help me out to figure out why is it taking so much time to
> update
> > the table.
> >
> > Also as u see in the above indexes, I have some indexes on some varchar
> > column which i feel are totally useless unless u so a exact string
> match.
> > But does that help in any sense for improving the speed of retreiving
> the
> > string just normally without any search on it?
> >
> > Thanks,
> > Sumeet.
> >
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>

--
Thanks,
Sumeet Ambre
Master of Information Science Candidate,
Indiana University.

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Oleg Bartunov 2007-03-07 20:49:27 Re: tesearch2 question
Previous Message Andrej Ricnik-Bay 2007-03-07 20:31:39 Re: SHA-1 vs MD5