From: | Ants Aasma <ants(dot)aasma(at)eesti(dot)ee> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: what to revert |
Date: | 2016-05-03 21:01:20 |
Message-ID: | CA+CSw_taAWC5zqa8cjQ6GG0Ca3rTXeWXJ_jD3BTDyLbPwf6EEw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, May 3, 2016 at 9:57 PM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> If you tell me how to best test it, I do have a 4-socket server sitting idly
> in the corner (well, a corner reachable by SSH). I can get us some numbers,
> but I haven't been following the snapshot_too_old so I'll need some guidance
> on what to test.
I worry about two contention points with the current implementation.
The main one is the locking within MaintainOldSnapshotTimeMapping()
that gets called every time a snapshot is taken. AFAICS this should
show up by setting old_snapshot_threshold to any positive value and
then running a simple within shared buffers scale factor read only
pgbench at high concurrency (number of CPUs or a small multiple). On a
single socket system this does not show up.
The second one is probably a bit harder to hit,
GetOldSnapshotThresholdTimestamp() has a spinlock that gets hit
everytime a scan sees a page that has been modified after the snapshot
was taken. A workload that would tickle this is something that uses a
repeatable read snapshot, builds a non-temporary table and runs
reporting on it. Something like this would work:
BEGIN ISOLATION LEVEL REPEATABLE READ;
DROP TABLE IF EXISTS test_:client_id;
CREATE TABLE test_:client_id (x int, filler text);
INSERT INTO test_:client_id SELECT x, repeat(' ', 1000) AS filler
FROM generate_series(1,1000) x;
SELECT (SELECT COUNT(*) FROM test_:client_id WHERE x != y) FROM
generate_series(1,1000) y;
COMMIT;
With this script running with -c4 on a 4 core workstation I'm seeing
the following kind of contention and a >2x loss in throughput:
+ 14.77% postgres postgres [.] GetOldSnapshotThresholdTimestamp
- 8.01% postgres postgres [.] s_lock
- s_lock
+ 88.15% GetOldSnapshotThresholdTimestamp
+ 10.47% TransactionIdLimitedForOldSnapshots
+ 0.71% TestForOldSnapshot_impl
+ 0.57% GetSnapshotCurrentTimestamp
Now this is kind of an extreme example, but I'm willing to bet that on
multi socket hosts similar issues can crop up with common real world
use cases.
Regards,
Ants Aasma
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2016-05-03 21:18:27 | Re: pg9.6 segfault using simple query (related to use fk for join estimates) |
Previous Message | David G. Johnston | 2016-05-03 20:39:40 | Re: Pg_stop_backup process does not run - Backup Intervals |