Re: Re: recently and selectively slow, but very simple, update query....

From: Stelios Mavromichalis <mstelios(at)cytech(dot)gr>
To: David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Re: recently and selectively slow, but very simple, update query....
Date: 2014-05-06 13:31:25
Message-ID: CAOfbx2aAW3LooxB45Ko6yyB9T8goZdB1i7nwa6OVfxu-K_1F+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

hello again,

some more data on the subject:

you are right, delete/re-insert didn't solve the problem (haven't yet tried
dump/restore, i might tonight, when low traffic).

a short snip from logs were i log all queries that take longer then 1sec:

2014-05-06T15:57:46.303880+03:00 s10 postgres[46220]: [1891-1] prosms
prosms LOG: 00000: duration: 1947.172 ms execute <unnamed>: select
easysms_jb_pay($1,
$2)
2014-05-06T15:57:46.304230+03:00 s10 postgres[46220]: [1891-2] prosms
prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:57:46.304439+03:00 s10 postgres[46220]: [1891-3] prosms
prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T15:57:56.199005+03:00 s10 postgres[58002]: [2570-1] prosms
prosms LOG: 00000: duration: 6869.886 ms execute <unnamed>: select
easysms_jb_pay($1,$2)
2014-05-06T15:57:56.199349+03:00 s10 postgres[58002]: [2570-2] prosms
prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:57:56.199567+03:00 s10 postgres[58002]: [2570-3] prosms
prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T15:57:59.134982+03:00 s10 postgres[58002]: [2571-1] prosms
prosms LOG: 00000: duration: 1797.747 ms execute <unnamed>: select
easysms_jb_pay($1,$2)
2014-05-06T15:57:59.135334+03:00 s10 postgres[58002]: [2571-2] prosms
prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:57:59.135562+03:00 s10 postgres[58002]: [2571-3] prosms
prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T15:58:07.149477+03:00 s10 postgres[46220]: [1892-1] prosms
prosms LOG: 00000: duration: 3938.979 ms execute <unnamed>: select
easysms_jb_pay($1,$2)
2014-05-06T15:58:07.149830+03:00 s10 postgres[46220]: [1892-2] prosms
prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:58:07.150067+03:00 s10 postgres[46220]: [1892-3] prosms
prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T16:01:33.784422+03:00 s10 postgres[58002]: [2572-1] prosms
prosms LOG: 00000: duration: 2921.212 ms execute <unnamed>: select
easysms_jb_pay($1,$2)
2014-05-06T16:01:33.784842+03:00 s10 postgres[58002]: [2572-2] prosms
prosms DETAIL: parameters: $1 = '10808', $2 = '4'
2014-05-06T16:01:33.785037+03:00 s10 postgres[58002]: [2572-3] prosms
prosms LOCATION: exec_execute_message, postgres.c:1989

should you deem helpful yes i can enable full query logging and summon the
listing for you.

a typical vmstat:
# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system--
-----cpu------
r b swpd free buff cache si so bi bo in cs us sy id
wa st
1 0 363476 325852 294064 30520736 0 0 26 347 0 0 1 0
99 0 0
1 0 363476 328400 294068 30520732 0 0 0 28 4930 8014 6 1
94 0 0
2 0 363476 331756 294068 30520732 0 0 0 4384 4950 7980 6 1
93 0 0
0 0 363476 334016 294068 30520756 0 0 0 4384 4961 7981 7 1
92 0 0
0 0 363476 334700 294068 30520756 0 0 0 4424 4012 6467 4 1
95 0 0
1 0 363476 330852 294068 30520788 0 0 0 0 2559 3861 5 1
95 0 0
1 0 363476 331316 294072 30520788 0 0 0 4408 5013 8127 6 1
94 0 0
1 0 363476 330788 294072 30520788 0 0 0 4384 5535 9055 6 1
93 0 0
0 0 363476 331496 294072 30520804 0 0 0 4384 5031 8092 7 1
92 0 0
2 0 363476 331268 294072 30520804 0 0 0 4428 5052 8246 6 1
94 0 0
1 0 363476 330848 294080 30520812 0 0 0 32 4892 7950 5 1
94 0 0
1 0 363476 330480 294080 30520812 0 0 0 4388 4935 8036 6 1
94 0 0
2 0 363476 332616 294084 30521092 0 0 0 4408 5064 8194 6 1
93 0 0
0 0 363476 333596 294084 30521008 0 0 0 4384 5205 8463 8 1
91 0 0
1 0 363476 333324 294084 30521008 0 0 0 40 5014 8151 6 1
94 0 0
0 0 363476 332740 294084 30521016 0 0 0 4384 5047 8163 6 1
93 0 0
1 0 363476 336052 294084 30520888 0 0 0 4384 4849 7780 6 1
94 0 0
1 0 363476 334732 294088 30520892 0 0 8 4400 5520 9012 6 1
93 0 0
0 0 363476 334064 294096 30520884 0 0 0 220 3903 6193 6 1
94 0 0
0 0 363476 333124 294096 30520916 0 0 0 2232 4088 6462 6 1
93 0 0

the process that is constantly writing the majority of data is:

"postgres: stats collector process"

and varies from 2.5mb/sec up to 5.7mb/sec

all the other postgres (and non-postgres) processes write very little data
and rarely.

the checkpointer process is like 78.6kb/sec (a few seconds now as i write
this email but no other is having a constant rate or I/O)

also, _while having the problem_ the results of the following queries are
(taken from http://wiki.postgresql.org/wiki/Lock_Monitoring)

SELECT relation::regclass, * FROM pg_locks WHERE NOT granted;
(0 rows)

SELECT bl.pid AS blocked_pid,
a.usename AS blocked_user,
ka.query AS blocking_statement,
now() - ka.query_start AS blocking_duration,
kl.pid AS blocking_pid,
ka.usename AS blocking_user,
a.query AS blocked_statement,
now() - a.query_start AS blocked_duration
FROM pg_catalog.pg_locks bl
JOIN pg_catalog.pg_stat_activity a ON a.pid = bl.pid
JOIN pg_catalog.pg_locks kl ON kl.transactionid =
bl.transactionid AND kl.pid != bl.pid
JOIN pg_catalog.pg_stat_activity ka ON ka.pid = kl.pid
WHERE NOT bl.granted;
(0 rows)

SELECT
waiting.locktype AS waiting_locktype,
waiting.relation::regclass AS waiting_table,
waiting_stm.query AS waiting_query,
waiting.mode AS waiting_mode,
waiting.pid AS waiting_pid,
other.locktype AS other_locktype,
other.relation::regclass AS other_table,
other_stm.query AS other_query,
other.mode AS other_mode,
other.pid AS other_pid,
other.granted AS other_grantedFROM
pg_catalog.pg_locks AS waitingJOIN
pg_catalog.pg_stat_activity AS waiting_stm
ON (
waiting_stm.pid = waiting.pid
)JOIN
pg_catalog.pg_locks AS other
ON (
(
waiting."database" = other."database"
AND waiting.relation = other.relation
)
OR waiting.transactionid = other.transactionid
)JOIN
pg_catalog.pg_stat_activity AS other_stm
ON (
other_stm.pid = other.pid
)WHERE
NOT waiting.grantedAND
waiting.pid <> other.pid;
(0 rows)

WITH RECURSIVE
c(requested, current) AS
( VALUES
('AccessShareLock'::text, 'AccessExclusiveLock'::text),
('RowShareLock'::text, 'ExclusiveLock'::text),
('RowShareLock'::text, 'AccessExclusiveLock'::text),
('RowExclusiveLock'::text, 'ShareLock'::text),
('RowExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
('RowExclusiveLock'::text, 'ExclusiveLock'::text),
('RowExclusiveLock'::text, 'AccessExclusiveLock'::text),
('ShareUpdateExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),
('ShareUpdateExclusiveLock'::text, 'ShareLock'::text),
('ShareUpdateExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
('ShareUpdateExclusiveLock'::text, 'ExclusiveLock'::text),
('ShareUpdateExclusiveLock'::text, 'AccessExclusiveLock'::text),
('ShareLock'::text, 'RowExclusiveLock'::text),
('ShareLock'::text, 'ShareUpdateExclusiveLock'::text),
('ShareLock'::text, 'ShareRowExclusiveLock'::text),
('ShareLock'::text, 'ExclusiveLock'::text),
('ShareLock'::text, 'AccessExclusiveLock'::text),
('ShareRowExclusiveLock'::text, 'RowExclusiveLock'::text),
('ShareRowExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),
('ShareRowExclusiveLock'::text, 'ShareLock'::text),
('ShareRowExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
('ShareRowExclusiveLock'::text, 'ExclusiveLock'::text),
('ShareRowExclusiveLock'::text, 'AccessExclusiveLock'::text),
('ExclusiveLock'::text, 'RowShareLock'::text),
('ExclusiveLock'::text, 'RowExclusiveLock'::text),
('ExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),
('ExclusiveLock'::text, 'ShareLock'::text),
('ExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
('ExclusiveLock'::text, 'ExclusiveLock'::text),
('ExclusiveLock'::text, 'AccessExclusiveLock'::text),
('AccessExclusiveLock'::text, 'AccessShareLock'::text),
('AccessExclusiveLock'::text, 'RowShareLock'::text),
('AccessExclusiveLock'::text, 'RowExclusiveLock'::text),
('AccessExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),
('AccessExclusiveLock'::text, 'ShareLock'::text),
('AccessExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
('AccessExclusiveLock'::text, 'ExclusiveLock'::text),
('AccessExclusiveLock'::text, 'AccessExclusiveLock'::text)
),
l AS
(
SELECT
(locktype,DATABASE,relation::regclass::text,page,tuple,virtualxid,transactionid,classid,objid,objsubid)
AS target,
virtualtransaction,
pid,
mode,
granted
FROM pg_catalog.pg_locks
),
t AS
(
SELECT
blocker.target AS blocker_target,
blocker.pid AS blocker_pid,
blocker.mode AS blocker_mode,
blocked.target AS target,
blocked.pid AS pid,
blocked.mode AS mode
FROM l blocker
JOIN l blocked
ON ( NOT blocked.granted
AND blocker.granted
AND blocked.pid != blocker.pid
AND blocked.target IS NOT DISTINCT FROM blocker.target)
JOIN c ON (c.requested = blocked.mode AND c.current = blocker.mode)
),
r AS
(
SELECT
blocker_target,
blocker_pid,
blocker_mode,
'1'::int AS depth,
target,
pid,
mode,
blocker_pid::text || ',' || pid::text AS seq
FROM t
UNION ALL
SELECT
blocker.blocker_target,
blocker.blocker_pid,
blocker.blocker_mode,
blocker.depth + 1,
blocked.target,
blocked.pid,
blocked.mode,
blocker.seq || ',' || blocked.pid::text
FROM r blocker
JOIN t blocked
ON (blocked.blocker_pid = blocker.pid)
WHERE blocker.depth < 1000
)SELECT * FROM r
ORDER BY seq;
(0 rows)

no finding here either :(

best,

/mstelios

Stelios Mavromichalis
Cytech Ltd. - http://www.cytech.gr/
Science & Technology Park of Crete
fax: +30 2810 31 1045
tel.: +30 2810 31 4127
mob.: +30 697 7078013
skype: mstelios

On Tue, May 6, 2014 at 2:54 AM, David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com
> wrote:

> Stelios Mavromichalis wrote
> > as a prior step to dump/restore i am thinking of deleting and
> re-inserting
> > that particular row. that might share some light you think?
>
> I still dislike the randomness of the unresponsiveness...
>
> Every time you perform an update you "delete and insert" that row - that is
> how an update works in MVCC - so doing so explicitly is unlikely to provide
> any benefit. Since that row is continually being inserted, and no other
> rows are having this issue, I'm seriously doubting that a dump/restore is
> going to have any effect either. Note that the index scan took twice as
> long in the bad case - but still reasonable and you didn't notice any
> benefit from a REINDEX. This is what I would expect.
>
> The only other difference, if concurrency has been ruled out, is the 4 vs
> 18
> buffers that had to be read. I cannot imagine that, since all 22 were in
> cache, that simply reading that much more data would account for the
> difference (we're talking a 10,000-fold increase, not 2to4-fold). The
> reason for this particular difference, IIUC, is how may candidate tuples
> are
> present whose visibility has to be accounted for (assuming 1 buffer per
> tuple, you needed to scan 4 vs 18 for visibility in the two queries).
>
> Is there any log file information you can share? Especially if you can set
> log_min_statement_duration (or whatever that GUC is named) so that whenever
> one of these gets adversely delayed it appears in the log along with
> whatever other system messages are being sent. Checkpoints are a typical
> culprit though that should be affecting a great deal more than what you
> indicate you are seeing.
>
> I'm pretty certain you are seeing this here largely because of the
> frequency
> of activity on this particular user; not because the data itself is
> corrupted. It could be some kind of symptom of internal concurrency that
> you just haven't observed yet but it could also be I/O or other system
> contention that you also haven't properly instrumented. Unfortunately that
> is beyond my current help-providing skill-set.
>
> A dump-restore likely would not make anything worse though I'd be surprised
> if it were to improve matters. It also doesn't seem like hardware - unless
> the RAM is bad. Software bugs are unlikely if this had been working well
> before 5 days ago. So, you need to observe the system during both periods
> (good and bad) and observe something that is different - probably not
> within
> PostgreSQL if indeed you've minimized concurrency. And also see if you can
> see if any other queries, executed during both these times, exhibit a
> performance decrease. Logging all statements would help matters greatly if
> you can afford it in your production environment - it would make looking
> for
> internal concurrency much easier.
>
> David J.
>
>
>
> --
> View this message in context:
> http://postgresql.1045698.n5.nabble.com/Re-recently-and-selectively-slow-but-very-simple-update-query-tp5802553p5802579.html
> Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Dorian Hoxha 2014-05-06 14:43:48 Re: Specifications for a new server
Previous Message Michael Stone 2014-05-06 11:24:10 Re: Specifications for a new server