From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Page-at-a-time Locking Considerations |
Date: | 2008-03-23 00:37:06 |
Message-ID: | 200803230037.m2N0b6c19764@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
With no concrete patch or performance numbers, this thread has been
removed from the patches queue.
---------------------------------------------------------------------------
Simon Riggs wrote:
>
> In heapgetpage() we hold the buffer locked while we look for visible
> tuples. That works well in most cases since the visibility check is fast
> if we have status bits set. If we don't have visibility bits set we have
> to do things like scan the snapshot and confirm things via clog lookups.
> All of that takes time and can lead to long buffer lock times, possibly
> across multiple I/Os in the very worst cases.
>
> This doesn't just happen for old transactions. Accessing very recent
> TransactionIds is prone to rare but long waits when we ExtendClog().
>
> Such problems are numerically rare, but the buffers with long lock times
> are also the ones that have concurrent or at least recent write
> operations on them. So all SeqScans have the potential to induce long
> wait times for write transactions, even if they are scans on 1 block
> tables. Tables with heavy write activity on them from multiple backends
> have their work spread across multiple blocks, so a SeqScan will hit
> this issue repeatedly as it encounters each current insertion point in a
> table and so greatly increases the chances of it occurring.
>
> It seems possible to just memcpy() the whole block away and then drop
> the lock quickly. That gives a consistent lock time in all cases and
> allows us to do the visibility checks in our own time. It might seem
> that we would end up copying irrelevant data, which is true. But the
> greatest cost is memory access time. If hardware memory pre-fetch cuts
> in we will find that the memory is retrieved en masse anyway; if it
> doesn't we will have to wait for each cache line. So the best case is
> actually an en masse retrieval of cache lines, in the common case where
> blocks are fairly full (vague cutoff is determined by exact mechanism of
> hardware/compiler induced memory prefetch).
>
> The copied block would be used only for visibility checks. The main
> buffer would retain its pin and we would pass references to the block
> through the executor as normal. So this would be a change completely
> isolated to heapgetpage().
>
> Was the copy-aside method considered when we introduced page at a time
> mode? Any reasons to think it would be dangerous or infeasible? If not,
> I'll give it a bash and get some test results.
>
> --
> Simon Riggs
> 2ndQuadrant http://www.2ndQuadrant.com
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2008-03-23 01:10:22 | pg_dump -i wording |
Previous Message | Bruce Momjian | 2008-03-23 00:32:20 | Re: pg_dump additional options for performance |