From: | Stelios Mavromichalis <mstelios(at)cytech(dot)gr> |
---|---|
To: | David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Re: recently and selectively slow, but very simple, update query.... |
Date: | 2014-05-06 13:31:25 |
Message-ID: | CAOfbx2aAW3LooxB45Ko6yyB9T8goZdB1i7nwa6OVfxu-K_1F+g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
hello again,
some more data on the subject:
you are right, delete/re-insert didn't solve the problem (haven't yet tried
dump/restore, i might tonight, when low traffic).
a short snip from logs were i log all queries that take longer then 1sec:
2014-05-06T15:57:46.303880+03:00 s10 postgres[46220]: [1891-1] prosms
prosms LOG: 00000: duration: 1947.172 ms execute <unnamed>: select
easysms_jb_pay($1,
$2)
2014-05-06T15:57:46.304230+03:00 s10 postgres[46220]: [1891-2] prosms
prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:57:46.304439+03:00 s10 postgres[46220]: [1891-3] prosms
prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T15:57:56.199005+03:00 s10 postgres[58002]: [2570-1] prosms
prosms LOG: 00000: duration: 6869.886 ms execute <unnamed>: select
easysms_jb_pay($1,$2)
2014-05-06T15:57:56.199349+03:00 s10 postgres[58002]: [2570-2] prosms
prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:57:56.199567+03:00 s10 postgres[58002]: [2570-3] prosms
prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T15:57:59.134982+03:00 s10 postgres[58002]: [2571-1] prosms
prosms LOG: 00000: duration: 1797.747 ms execute <unnamed>: select
easysms_jb_pay($1,$2)
2014-05-06T15:57:59.135334+03:00 s10 postgres[58002]: [2571-2] prosms
prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:57:59.135562+03:00 s10 postgres[58002]: [2571-3] prosms
prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T15:58:07.149477+03:00 s10 postgres[46220]: [1892-1] prosms
prosms LOG: 00000: duration: 3938.979 ms execute <unnamed>: select
easysms_jb_pay($1,$2)
2014-05-06T15:58:07.149830+03:00 s10 postgres[46220]: [1892-2] prosms
prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:58:07.150067+03:00 s10 postgres[46220]: [1892-3] prosms
prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T16:01:33.784422+03:00 s10 postgres[58002]: [2572-1] prosms
prosms LOG: 00000: duration: 2921.212 ms execute <unnamed>: select
easysms_jb_pay($1,$2)
2014-05-06T16:01:33.784842+03:00 s10 postgres[58002]: [2572-2] prosms
prosms DETAIL: parameters: $1 = '10808', $2 = '4'
2014-05-06T16:01:33.785037+03:00 s10 postgres[58002]: [2572-3] prosms
prosms LOCATION: exec_execute_message, postgres.c:1989
should you deem helpful yes i can enable full query logging and summon the
listing for you.
a typical vmstat:
# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system--
-----cpu------
r b swpd free buff cache si so bi bo in cs us sy id
wa st
1 0 363476 325852 294064 30520736 0 0 26 347 0 0 1 0
99 0 0
1 0 363476 328400 294068 30520732 0 0 0 28 4930 8014 6 1
94 0 0
2 0 363476 331756 294068 30520732 0 0 0 4384 4950 7980 6 1
93 0 0
0 0 363476 334016 294068 30520756 0 0 0 4384 4961 7981 7 1
92 0 0
0 0 363476 334700 294068 30520756 0 0 0 4424 4012 6467 4 1
95 0 0
1 0 363476 330852 294068 30520788 0 0 0 0 2559 3861 5 1
95 0 0
1 0 363476 331316 294072 30520788 0 0 0 4408 5013 8127 6 1
94 0 0
1 0 363476 330788 294072 30520788 0 0 0 4384 5535 9055 6 1
93 0 0
0 0 363476 331496 294072 30520804 0 0 0 4384 5031 8092 7 1
92 0 0
2 0 363476 331268 294072 30520804 0 0 0 4428 5052 8246 6 1
94 0 0
1 0 363476 330848 294080 30520812 0 0 0 32 4892 7950 5 1
94 0 0
1 0 363476 330480 294080 30520812 0 0 0 4388 4935 8036 6 1
94 0 0
2 0 363476 332616 294084 30521092 0 0 0 4408 5064 8194 6 1
93 0 0
0 0 363476 333596 294084 30521008 0 0 0 4384 5205 8463 8 1
91 0 0
1 0 363476 333324 294084 30521008 0 0 0 40 5014 8151 6 1
94 0 0
0 0 363476 332740 294084 30521016 0 0 0 4384 5047 8163 6 1
93 0 0
1 0 363476 336052 294084 30520888 0 0 0 4384 4849 7780 6 1
94 0 0
1 0 363476 334732 294088 30520892 0 0 8 4400 5520 9012 6 1
93 0 0
0 0 363476 334064 294096 30520884 0 0 0 220 3903 6193 6 1
94 0 0
0 0 363476 333124 294096 30520916 0 0 0 2232 4088 6462 6 1
93 0 0
the process that is constantly writing the majority of data is:
"postgres: stats collector process"
and varies from 2.5mb/sec up to 5.7mb/sec
all the other postgres (and non-postgres) processes write very little data
and rarely.
the checkpointer process is like 78.6kb/sec (a few seconds now as i write
this email but no other is having a constant rate or I/O)
also, _while having the problem_ the results of the following queries are
(taken from http://wiki.postgresql.org/wiki/Lock_Monitoring)
SELECT relation::regclass, * FROM pg_locks WHERE NOT granted;
(0 rows)
SELECT bl.pid AS blocked_pid,
a.usename AS blocked_user,
ka.query AS blocking_statement,
now() - ka.query_start AS blocking_duration,
kl.pid AS blocking_pid,
ka.usename AS blocking_user,
a.query AS blocked_statement,
now() - a.query_start AS blocked_duration
FROM pg_catalog.pg_locks bl
JOIN pg_catalog.pg_stat_activity a ON a.pid = bl.pid
JOIN pg_catalog.pg_locks kl ON kl.transactionid =
bl.transactionid AND kl.pid != bl.pid
JOIN pg_catalog.pg_stat_activity ka ON ka.pid = kl.pid
WHERE NOT bl.granted;
(0 rows)
SELECT
waiting.locktype AS waiting_locktype,
waiting.relation::regclass AS waiting_table,
waiting_stm.query AS waiting_query,
waiting.mode AS waiting_mode,
waiting.pid AS waiting_pid,
other.locktype AS other_locktype,
other.relation::regclass AS other_table,
other_stm.query AS other_query,
other.mode AS other_mode,
other.pid AS other_pid,
other.granted AS other_grantedFROM
pg_catalog.pg_locks AS waitingJOIN
pg_catalog.pg_stat_activity AS waiting_stm
ON (
waiting_stm.pid = waiting.pid
)JOIN
pg_catalog.pg_locks AS other
ON (
(
waiting."database" = other."database"
AND waiting.relation = other.relation
)
OR waiting.transactionid = other.transactionid
)JOIN
pg_catalog.pg_stat_activity AS other_stm
ON (
other_stm.pid = other.pid
)WHERE
NOT waiting.grantedAND
waiting.pid <> other.pid;
(0 rows)
WITH RECURSIVE
c(requested, current) AS
( VALUES
('AccessShareLock'::text, 'AccessExclusiveLock'::text),
('RowShareLock'::text, 'ExclusiveLock'::text),
('RowShareLock'::text, 'AccessExclusiveLock'::text),
('RowExclusiveLock'::text, 'ShareLock'::text),
('RowExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
('RowExclusiveLock'::text, 'ExclusiveLock'::text),
('RowExclusiveLock'::text, 'AccessExclusiveLock'::text),
('ShareUpdateExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),
('ShareUpdateExclusiveLock'::text, 'ShareLock'::text),
('ShareUpdateExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
('ShareUpdateExclusiveLock'::text, 'ExclusiveLock'::text),
('ShareUpdateExclusiveLock'::text, 'AccessExclusiveLock'::text),
('ShareLock'::text, 'RowExclusiveLock'::text),
('ShareLock'::text, 'ShareUpdateExclusiveLock'::text),
('ShareLock'::text, 'ShareRowExclusiveLock'::text),
('ShareLock'::text, 'ExclusiveLock'::text),
('ShareLock'::text, 'AccessExclusiveLock'::text),
('ShareRowExclusiveLock'::text, 'RowExclusiveLock'::text),
('ShareRowExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),
('ShareRowExclusiveLock'::text, 'ShareLock'::text),
('ShareRowExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
('ShareRowExclusiveLock'::text, 'ExclusiveLock'::text),
('ShareRowExclusiveLock'::text, 'AccessExclusiveLock'::text),
('ExclusiveLock'::text, 'RowShareLock'::text),
('ExclusiveLock'::text, 'RowExclusiveLock'::text),
('ExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),
('ExclusiveLock'::text, 'ShareLock'::text),
('ExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
('ExclusiveLock'::text, 'ExclusiveLock'::text),
('ExclusiveLock'::text, 'AccessExclusiveLock'::text),
('AccessExclusiveLock'::text, 'AccessShareLock'::text),
('AccessExclusiveLock'::text, 'RowShareLock'::text),
('AccessExclusiveLock'::text, 'RowExclusiveLock'::text),
('AccessExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),
('AccessExclusiveLock'::text, 'ShareLock'::text),
('AccessExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
('AccessExclusiveLock'::text, 'ExclusiveLock'::text),
('AccessExclusiveLock'::text, 'AccessExclusiveLock'::text)
),
l AS
(
SELECT
(locktype,DATABASE,relation::regclass::text,page,tuple,virtualxid,transactionid,classid,objid,objsubid)
AS target,
virtualtransaction,
pid,
mode,
granted
FROM pg_catalog.pg_locks
),
t AS
(
SELECT
blocker.target AS blocker_target,
blocker.pid AS blocker_pid,
blocker.mode AS blocker_mode,
blocked.target AS target,
blocked.pid AS pid,
blocked.mode AS mode
FROM l blocker
JOIN l blocked
ON ( NOT blocked.granted
AND blocker.granted
AND blocked.pid != blocker.pid
AND blocked.target IS NOT DISTINCT FROM blocker.target)
JOIN c ON (c.requested = blocked.mode AND c.current = blocker.mode)
),
r AS
(
SELECT
blocker_target,
blocker_pid,
blocker_mode,
'1'::int AS depth,
target,
pid,
mode,
blocker_pid::text || ',' || pid::text AS seq
FROM t
UNION ALL
SELECT
blocker.blocker_target,
blocker.blocker_pid,
blocker.blocker_mode,
blocker.depth + 1,
blocked.target,
blocked.pid,
blocked.mode,
blocker.seq || ',' || blocked.pid::text
FROM r blocker
JOIN t blocked
ON (blocked.blocker_pid = blocker.pid)
WHERE blocker.depth < 1000
)SELECT * FROM r
ORDER BY seq;
(0 rows)
no finding here either :(
best,
/mstelios
Stelios Mavromichalis
Cytech Ltd. - http://www.cytech.gr/
Science & Technology Park of Crete
fax: +30 2810 31 1045
tel.: +30 2810 31 4127
mob.: +30 697 7078013
skype: mstelios
On Tue, May 6, 2014 at 2:54 AM, David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com
> wrote:
> Stelios Mavromichalis wrote
> > as a prior step to dump/restore i am thinking of deleting and
> re-inserting
> > that particular row. that might share some light you think?
>
> I still dislike the randomness of the unresponsiveness...
>
> Every time you perform an update you "delete and insert" that row - that is
> how an update works in MVCC - so doing so explicitly is unlikely to provide
> any benefit. Since that row is continually being inserted, and no other
> rows are having this issue, I'm seriously doubting that a dump/restore is
> going to have any effect either. Note that the index scan took twice as
> long in the bad case - but still reasonable and you didn't notice any
> benefit from a REINDEX. This is what I would expect.
>
> The only other difference, if concurrency has been ruled out, is the 4 vs
> 18
> buffers that had to be read. I cannot imagine that, since all 22 were in
> cache, that simply reading that much more data would account for the
> difference (we're talking a 10,000-fold increase, not 2to4-fold). The
> reason for this particular difference, IIUC, is how may candidate tuples
> are
> present whose visibility has to be accounted for (assuming 1 buffer per
> tuple, you needed to scan 4 vs 18 for visibility in the two queries).
>
> Is there any log file information you can share? Especially if you can set
> log_min_statement_duration (or whatever that GUC is named) so that whenever
> one of these gets adversely delayed it appears in the log along with
> whatever other system messages are being sent. Checkpoints are a typical
> culprit though that should be affecting a great deal more than what you
> indicate you are seeing.
>
> I'm pretty certain you are seeing this here largely because of the
> frequency
> of activity on this particular user; not because the data itself is
> corrupted. It could be some kind of symptom of internal concurrency that
> you just haven't observed yet but it could also be I/O or other system
> contention that you also haven't properly instrumented. Unfortunately that
> is beyond my current help-providing skill-set.
>
> A dump-restore likely would not make anything worse though I'd be surprised
> if it were to improve matters. It also doesn't seem like hardware - unless
> the RAM is bad. Software bugs are unlikely if this had been working well
> before 5 days ago. So, you need to observe the system during both periods
> (good and bad) and observe something that is different - probably not
> within
> PostgreSQL if indeed you've minimized concurrency. And also see if you can
> see if any other queries, executed during both these times, exhibit a
> performance decrease. Logging all statements would help matters greatly if
> you can afford it in your production environment - it would make looking
> for
> internal concurrency much easier.
>
> David J.
>
>
>
> --
> View this message in context:
> http://postgresql.1045698.n5.nabble.com/Re-recently-and-selectively-slow-but-very-simple-update-query-tp5802553p5802579.html
> Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
From | Date | Subject | |
---|---|---|---|
Next Message | Dorian Hoxha | 2014-05-06 14:43:48 | Re: Specifications for a new server |
Previous Message | Michael Stone | 2014-05-06 11:24:10 | Re: Specifications for a new server |