From: | Richard Huxton <dev(at)archonet(dot)com> |
---|---|
To: | Jacques Caron <jc(at)directinfos(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Performance tuning |
Date: | 2005-03-14 09:41:20 |
Message-ID: | 42355C40.1070207@archonet.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Jacques Caron wrote:
> I'm preparing a set of servers which will eventually need to handle a
> high volume of queries (both reads and writes, but most reads are very
> simple index-based queries returning a limited set of rows, when not
> just one), and I would like to optimize things as much as possible, so I
> have a few questions on the exact way PostgreSQL's MVCC works, and how
> transactions, updates and vacuuming interact. I hope someone will be
> able to point me in the right direction (feel free to give pointers if I
> missed the places where this is described).
>
> From what I understand (and testing confirms it), bundling many queries
> in one single transaction is more efficient than having each query be a
> separate transaction (like with autocommit on). However, I wonder about
> the limits of this:
>
> - are there any drawbacks to grouping hundreds or thousands of queries
> (inserts/updates) over several minutes in one single transaction? Other
> than the fact that the inserts/updates will not be visible until
> committed, of course. Essentially turning autocommit off, and doing a
> commit once in a while.
1. If any locks are held then they will be held for much longer, causing
other processes to block.
2. PG needs to be able to roll back the changes - thousands of simple
inserts are fine, millions will probably not be.
> - does this apply only to inserts/selects/updates or also for selects?
> Another way to put this is: does a transaction with only one select
> actually have much transaction-related work to do? Or, does a
> transaction with only selects actually have any impact anywhere? Does it
> really leave a trace anywhere? Again, I understand that selects grouped
> in a transaction will not see updates done after the start of the
> transaction (unless done by the same process).
There are implications if a SELECT has side-effects (I can call a
function in a select - that might do anything).
> - if during a single transaction several UPDATEs affect the same row,
> will MVCC generate as many row versions as there are updates (like would
> be the case with autocommit) or will they be grouped into one single row
> version?
I believe there will be many versions. Certainly for 8.0 that must be
the case to support savepoints within a transaction.
> Another related issue is that many of the tables are indexed on a date
> field, and one process does a lot of updates on "recent" rows (which
> lead to many dead tuples), but after that "older" rows tend to remain
> pretty much unchanged for quite a while. Other than splitting the tables
> into "old" and "recent" tables, is there any way to make vacuum more
> efficient? Scanning the whole table for dead tuples when only a small
> portion of the table actually has any does not feel like being very
> efficient in this situation.
Not really.
> Other issue: every five minutes or so, I see a noticeable performance
> drop as PostgreSQL checkpoints. This is 7.4.3 with pretty lousy
> hardware, I know 8.0 with decent hardware and separate disk(s) for
> pg_xlog will definitely help, but I really wonder if there is any way to
> reduce the amount of work that needs to be done at that point (I'm a
> strong believer of fixing software before hardware). I have already
> bumped checkpoint_segments to 8, but I'm not quite sure I understand how
> this helps (or doesn't help) things. Logs show 3 to 6 "recycled
> transaction log file" lines at that time, that seems quite a lot of work
> for a load that's still pretty low. Does grouping of more queries in
> transactions help with this? Are there other parameters that can affect
> things, or is just a matter of how much inserts/updates/deletes are
> done, and the amount of data that was changed?
You might be better off reducing the number of checkpoint segments, and
decreasing the timeout. There is a balance between doing a lot of work
in one go, and the overhead of many smaller bursts of activity.
> Last point: some of the servers have expandable data (and will be
> replicated with slony-I) and will run with fsync off. I have read
> conflicting statements as to what exactly this does: some sources
> indicate that setting fsync off actually switches off WAL/checkpointing,
> others that it just prevents the fsync (or equivalent) system calls.
> Since I still see checkpointing in that case, I guess it's not exactly
> the former, but I would love to understand more about it. Really, I
> would love to be able to set some tables or databases to "go as fast as
> you can and don't worry about transactions, MVCC or anything like that",
> but I'm not sure that option exists...
Setting fsync=false means the sync isn't done, so data might still be
cached below PG's level. I'm not sure it's ever going to be possible to
mark a table as "ignore transactions" - it would be a lot of work, and
means you couldn't guarantee transactions that included that table in
any way.
--
Richard Huxton
Archonet Ltd
From | Date | Subject | |
---|---|---|---|
Next Message | Mirko Zeibig | 2005-03-14 10:43:52 | Re: Questions about 2 databases. |
Previous Message | Qingqing Zhou | 2005-03-14 09:26:55 | Re: column name is "LIMIT" |