From: | "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com> |
---|---|
To: | "Simon Riggs" <simon(at)2ndquadrant(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: log_duration and log_statement |
Date: | 2006-03-14 13:27:22 |
Message-ID: | 1d4e0c10603140527j63bcaea6t4bf15d52ffeab880@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Simon,
On 3/14/06, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> Thanks very much for writing pgFouine. We've been doing our best to
> support generation of useful logs for performance analysis, so please
> feel free to ask for anything you see a need for.
OK. Thanks. I should write something clear about what we need one day
or another.
The two problems I have in mind currently are:
- problem to isolate a transaction as a whole. I mean I'd like to have
something to aggregate the BEGIN; QUERY1; QUERY2; COMMIT; as a whole ;
- problem with the way queries are logged when using PG 8.1 and the
8.1 jdbc driver. All our queries are logged as PREPARE and EXECUTE and
we don't have the variables passed in the log (so no value for $1,
$2... and we have the same problem to aggregate the real time of a
query) but I saw a few threads on -hackers talking about this problem.
> I note your graphs don't show missing values: on the bottom graph there
> is no data for 7pm and 2am, yet the graph passes directly from 6 to 8pm
> as if the figure for 7pm was mid-way between them, rather than zero.
Which is obviouly wrong. I missed it as it's not a common case on our
db. Thanks for the report.
> How do you tell the difference between a SELECT and a Write query when
> you do not see the text of the query?
A SELECT is a query beginning with SELECT. It's far from being perfect
but it works most of the time for our projects.
Our main problem is for queries like SELECT update_tables(); which are
inserts/delete in a stored procedure.
If you know any way to do it better, I'm open to any suggestion.
I don't know if it's possible for the backend to log this sort of information.
> I think I need more persuasion before I see the value of this, but I'm
> not going to immediately disregard this either.
>
> Collecting information is interesting, but it must in some way lead to a
> rational corrective action. Logging the duration of statements without
> recording what they are would tell you there was a problem but make it
> difficult to move towards an action rationally. Perhaps I'm
> misunderstanding this.
IMHO, the duration information are interesting as I can see when my
database is heavily loaded which is not correlated with the http
requests (two levels of cache) or with the server load on this
particular website.
Let's admit we have a slow down between 1am and 2pm. There can be a
lot of reasons for that:
- a lot of selects at this time because we are under heavy load from
our website;
- a lot of updates (cronjobs for example);
- another unknow reason we should try to find.
Having duration for all my queries can help me to understand the
problem as I will know:
- how many queries the database really executed at this particular time;
- how slow they are.
If I don't have a global slow down, perhaps, at this time I have a
specific table locked for an unknown reason.
I agree with you that it won't give me all the information to solve my
problem but _it allows me to detect the problem_ and it can at least
give me guidelines to how I can detect it more accurately.
It also gives us a way to see the evolution from one day to another as
we just operates the database, we don't develop the website.
If I have a boost from 1 million queries a day to 20 millions queries
a day, I won't know it with log_min_duration_statement and I can
detect it if log_duration logs every query.
> Is the issue that the log volume is too high? We might be able to look
> at ways to reduce/compress the log volume for duration/statement
> logging.
Well, I'm not sure we can reduce the size of a syslog log but perhaps
I'm wrong. Our problem is both the volume and the performance drop
introduced by logging.
> Another thought might be to provide a random sample of queries. A 10%
> sample would be sufficient for your case here and yet would provide a
> possibility of deeper analysis also.
Why not but I won't have a global overview just a statistical
information and I'd really like to have this overview without logging
every query with log_statement (which we do sometimes for a few
minutes when we really need more information about a specific
problem).
--
Guillaume
From | Date | Subject | |
---|---|---|---|
Next Message | Paulina Quezada | 2006-03-14 15:14:32 | Pregunta sobre limitar el número de registros a borrar... |
Previous Message | Guillaume Smet | 2006-03-14 12:56:30 | Re: log_duration and log_statement |