Re: Answering my own question

From: gerry(dot)smit(at)lombard(dot)ca
To: "Francisco Reyes" <lists(at)natserv(dot)com>
Cc: ghaverla(at)freenet(dot)edmonton(dot)ab(dot)ca, pgsql-novice(at)postgresql(dot)org
Subject: Re: Answering my own question
Date: 2002-05-15 19:59:41
Message-ID: OF4AAF6E43.EB57FA6E-ON85256BBA.006B19B6@lombard.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice


Well, having copies of stats from the server.log obscure timing info in the
VACUUM log doesn't help. And the stats are fairly unreadable, at least to
this novice. :-)

vacuum - FULL is new for me, and I've been wondering about how to recycle
disk space. THANKS.

vacuum - ANALYZE - prodcing stats for the QUERY PLANner is also something I
wasn'tt aware of.

Here's the scenario. We have about 8 tables or so, in 4 regional databases.

- two of the tables are updated via on-line transaction from our users.
Probably a couple of hundred records a day , tops.
- records eligible for archiving are "SELECT INTO" and "DELETE"d
about once a quarter. typically 1000 or so.

- the rest of the tables are updated from the Mainframe batch cycle, at
about 3:00 am local time.
- these records NEVER get updated, just more added each day.

- the problem tables have about 800,000 records in them.
- the daily table getting 3000-5000 new records a day.
- once a month, records more than 12 months old are purged, no
archive required.
- the non-daily table is DROPd and rebuilt monthly.
Newly generated records are loaded into it from batch
processes that take about two days time , at the start of each month.
- thus the "high water" mark is highest just before monthend.
- daily volumes are slowly increasing over time, so the "high water
mark" is also going higher.

VACUUM is run nightly, at about 7:00 pm. This is to make the system ready
for UNIX back-ups, at about 9:30 pm. Also, in the past, the VACUUM has
taken hours, and seems to lock the database (not just the table). Thus
we're reluctant to run VACUUM at 5:00 am, when daily batch finishes.

The primary reason for the hours of VACUUM was the non-daily table , above,
back when we DELETEd most records, and then reloaded over two days.
Deleting 780,000 records of an 800,000 row table creates an IMMENSELY long
VACUUM. Hence our DROP and rebuild.

I suspect that what I should do is :

1) continue nightly VACUUM's at about 7:00 pm
2) add a weekly VACUUM FULL ANALYZE to be run Saturday evening, ensuring
its ready for Sunday FULL disk back-ups.
3) I should run a VACUUM FULL ANALYZE tonight as a one-shot.
4) Consider adding the statement "SET enable_seqscan='false';" to the TWO
locations in our on-line system where the problem SELECT's are coded.
However, during my testing, SET returns the output "SET VARIABLE" which is
being interpretted as the first line of output from the accompanying SELECT
statement. Any ideas on how to make the SET statement "quiet" ?

Step 4 may not be necessary if Steps 1-3 do the job, btw.

Oh, we're a 16/5 operation, with some people dialing in or coming in on
weekends. 24/7 is NOT a requirement.

Gerry Smit,
Toronto, Ontario


"Francisco
Reyes" To: gerry(dot)smit(at)lombard(dot)ca
<lists(at)natserv(dot)c cc: ghaverla(at)freenet(dot)edmonton(dot)ab(dot)ca, pgsql-novice(at)postgresql(dot)org
om> Fax to:
Subject: Re: [NOVICE] Answering my own question
15/05/2002 03:29
PM

On Wed, 15 May 2002 gerry(dot)smit(at)lombard(dot)ca wrote:

> Now THAT's REALLY fascinating. We upgraded from 6.5 to 7.1 about 4-5
> months ago , and only got this problem after that. I leave "ANALYZE" out
of
> VACUUM due to the large volume of (I though needless) output that gets
> piped to my cron_log.

Display of activities is controllable through settings.
It is also somewhat silly to not run analyze just because it displays too
much data in a log.

> Does anyone have a feal for how "necessary" VACUUM ANALYZE is over
"VACUUM"
> Gerry

Like many other things in live the answer is: it depends.
Vacuum is about been able to reuse space which was used by rows that have
been deleted.

Vacuum Full is about recovering the disk space used by deleted rows.

Analyze is about re-computing statistics that the analyzer uses to
determine best strategies.

The answer to your question will be directly related to how much and how
often new data is inserted in your database.

If you NEVER change your data or make inserts then you NEVER have to run
either vacuum or analyze.

If you have a very heavily loaded environment with hundreds/thousands
transactions per minute, then an hourly vacuum analyze would probably
help.

Why don you tell us more about your particular scenario?

For instance I do ONE big load every morning from a production Foxpro
system to my reporting PostgreSQL system. I do one vacuum analyze just
after the load, then I compute some stats and populate some files, then
another vacuum analyze. At 10pm I do a vacuum full. This works great for
me.

Before I was doing one vacuum analyze after the stats computations and it
was horrible. Something which takes to compute less than an hour if I have
done a vacuum analyze took about 4 hours without vacuum analyze and that
was just ONE of the things that needed to be done. The first vacuum
analyze takes about 50 minutes and the second takes about 8 minutes. A lot
less than the 4+ hours of savings from having run vacuum analyze vs doing
the stats computations without them.

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Balazs Wellisch 2002-05-15 22:57:37 Re: Answering my own question
Previous Message Francisco Reyes 2002-05-15 19:29:00 Re: Answering my own question