From: | Josh Berkus <josh(at)agliodbs(dot)com> |
---|---|
To: | "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net> |
Cc: | Shridhar Daithankar <shridhar_daithankar(at)myrealbox(dot)com>, pgsql-performance(at)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] More detail on settings for pgavd? |
Date: | 2003-11-21 06:24:45 |
Message-ID: | 200311202224.45065.josh@agliodbs.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
Matthew,
> > 110% of a 1.1 million row table is updated, though, that vaccuum will
> > take an hour or more.
>
> True, but I think it would be one hour once, rather than 30 minutes 4
> times.
Well, generally it would be about 6-8 times at 2-4 minutes each.
> This is one of the things I had hoped to add to pg_autovacuum, but never
> got to. In addition to just the information from the stats collector on
> inserts updates and deletes, pg_autovacuum should also look at the FSM,
> and make decisions based on it. Anyone looking for a project?
Hmmm ... I think that's the wrong approach. Once your database is populated,
it's very easy to determine how to set the FSM for a given pg_avd level. If
you're vacuuming after 20% updates, for example, just set fsm_pages to 20% of
the total database pages plus growth & safety margins.
I'd be really reluctant to base pv-avd frequency on the fsm settings instead.
What if the user loads 8GB of data but leaves fsm_pages at the default of
10,000? You can't do much with that; you'd have to vacuum if even 1% of the
data changed.
The other problem is that calculating data pages from a count of
updates+deletes would require pg_avd to keep more statistics and do more math
for every table. Do we want to do this?
> But I can't imagine that 2% makes any difference on a large table. In
> fact I would think that 10-15% would hardly be noticable, beyond that
> I'm not sure.
I've seen performance lag at 10% of records, especially in tables where both
update and select activity focus on one subset of the table (calendar tables,
for example).
> Valid points, and again I think this points to the fact that
> pg_autovacuum needs to be more configurable. Being able to set
> different thresholds for different tables will help considerably. In
> fact, you may find that some tables should have a vac threshold much
> larger than the analyze thresold, while other tables might want the
> opposite.
Sure. Though I think we can make the present configuration work with a little
adjustment of the numbers. I'll have a chance to test on production
databases soon.
> I would be surprized if you can notice the difference between a vacuum
> analyze and a vacuum, especially on large tables.
It's substantial for tables with high statistics settings. A 1,000,000 row
table with 5 columns set to statistics=250 can take 3 minutes to analyze on a
medium-grade server.
> I think you need two separate schedules. There are lots of times where
> a vacuum doesn't help, and an analyze is all that is needed
Agreed. And I've just talked to a client who may want to use pg_avd's ANALYZE
scheduling but not use vacuum at all. BTW, I think we should have a setting
for this; for example, if -V is -1, don't vacuum.
> I'm open to discussion on changing the defaults. Perhaps what it would
> be better to use some non-linear (perhaps logorithmic) scaling factor.
> So that you wound up with something roughly like this:
>
> #tuples activity% for vacuum
> 1k 100%
> 10k 70%
> 100k 45%
> 1M 20%
> 10M 10%
> 100M 8%
That would be cool, too. Though a count of data pages would be a better
scale than a count of rows, and equally obtainable from pg_class.
> Thanks for the lucid feedback / discussion. autovacuum is a feature
> that, despite it's simple implementation, has generated a lot of
> feedback from users, and I would really like to see it become something
> closer to what it should be.
Well, I hope to help now. Until very recently, I've not had a chance to
seriously look at pg_avd and test it in production. Now that I do, I'm
interested in improving it.
--
Josh Berkus
Aglio Database Solutions
San Francisco
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2003-11-21 07:16:18 | Re: [HACKERS] with(isstrict) vs ISSTRICT |
Previous Message | Tom Lane | 2003-11-21 06:20:53 | Re: Sponsoring enterprise features |
From | Date | Subject | |
---|---|---|---|
Next Message | Ryszard Lach | 2003-11-21 08:53:17 | Re: duration logging setting in 7.4 |
Previous Message | stephen farrell | 2003-11-21 02:05:09 | Re: Problem with insert into select... |