"elias ghanem" <e(dot)ghanem(at)acteos(dot)com> wrote:
> here's more details as you requested
You didn't include an EXPLAIN ANALYZE of the UPDATE statement.
> -The version of postgres is 8.4 (by the way select pg_version() is
> not working but let's concentrate on the query issue)
As far as I know, there is no pg_version() function; try
SELECT version();
Sometimes the exact version is relevant to a performance issue, but
there aren't many of fixes for performance regression in 8.4 minor
releases, so it might not matter in this particular case.
> -Concerning the postgresql.conf file I've tried to changed the
> default values such as: shared_buffers and effective_cache_size.
> but this did not change the result.
Perhaps not, but other settings might help performance. Am I to
understand that you're running an "out of the box" configuration,
with no tuning yet?
> -The WAL IS NOT ON DIFFERENT DISK, THEY ARE ON THE SAME DISK WHER
> THE DB IS (for the moment I don't have the possibility of moving
> them to another disk but maybe "just for testing" you can tell me
> how I can totally disable WAL if possible).
You can't totally disable it, as it is there primarily to ensure
database integrity. There are several ways to tune it, based on the
number of WAL segments, the WAL buffers, the background writer
aggressiveness, various delays, etc. Understanding the workload is
key to appropriate tuning.
> I'm using postgresql 8.4 on Linux machine with 1.5 GB RAM, and I'm
> issuing an update query with a where clause that updates
> approximately 100 000 rows in a table containing approximately
> 3 200 000 rows.
This is not a use case where PostgreSQL shines; it is, however, a
rather unusual use case in normal operations. I'm curious why
you're testing this -- if we understood the real problem behind the
test we might be able to provide more useful advice. "Teaching to
the test" has its limitations.
> The update query is very simple: UPDATE IN_SORTIE SET VALIDE =
> VALIDE WHERE VALEUR < 0.83 (the where clause is used to limit the
> affected rows to ~ 100 000, and the "SET VALIDE = VALIDE" is only
> on purpose to keep the data of the table unchanged).
So you want to optimize a query which does absolutely nothing to the
data. It's not hard to make that particular case *much* faster,
which again leads one to wonder what you're *really* trying to
optimize. If we knew that, it might open up options not applicable
to the synthetic case.
> (the function is called 100 times with a vacuum analyze after
> each call for the table).
>
> So the average execution time of the function is around 2.5 mins,
> meaning that the update query (+ the vacuum) takes 2.5 mins to
> execute.
Vacuuming normally happens as a background or off-hours process, so
as not to slow down user queries. Now, running ten million updates
against a table with 3.2 million rows without vacuuming would cause
its own set of problems; so we're back to the question of -- if you
really don't want to do ten million updates to a three million row
table to make no changes, what is it that you *do* want to do for
which you're using this test to optimize? Any advice given without
knowing that would be a shot in the dark.
-Kevin