Re: Performance Bottleneck

From: Martin Foster <martin(at)ethereal-realms(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance Bottleneck
Date: 2004-08-04 03:49:11
Message-ID: X0ZPc.23543$yT2.23053@clgrps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Gaetano Mendola wrote:
> Martin Foster wrote:
>
>> I run a Perl/CGI driven website that makes extensive use of PostgreSQL
>> (7.4.3) for everything from user information to formatting and display
>> of specific sections of the site. The server itself, is a dual
>> processor AMD Opteron 1.4Ghz w/ 2GB Ram and 2 x 120GB hard drives
>> mirrored for redundancy running under FreeBSD 5.2.1 (AMD64).
>>
>> Recently loads on the site have increased during peak hours to the
>> point of showing considerable loss in performance. This can be
>> observed when connections move from the 120 concurrent connections to
>> PostgreSQL to roughly 175 or more. Essentially, the machine seems
>> to struggle to keep up with continual requests and slows down
>> respectively as resources are tied down.
>>
>> Code changes have been made to the scripts to essentially back off in
>> high load working environments which have worked to an extent.
>> However, as loads continue to increase the database itself is not
>> taking well to the increased traffic taking place.
>>
>> Having taken a look at 'Tuning PostgreSQL for Performance'
>> (http://www.varlena.com/GeneralBits/Tidbits/perf.html) using it as
>> best I could in order to set my settings. However, even with
>> statistics disabled and ever setting tweaked things still consider to
>> deteriorate.
>>
>> Is there anything anyone can recommend in order to give the system a
>> necessary speed boost? It would seem to me that a modest dataset of
>> roughly a Gig combined with that type of hardware should be able to
>> handle substantially more load then what it is. Can anyone provide me
>> with clues as where to pursue? Would disabling 'fsync' provide more
>> performance if I choose that information may be lost in case of a crash?
>>
>> If anyone needs access to logs, settings et cetera. Please ask, I
>> simply wish to test the waters first on what is needed. Thanks!
>
>
> Tell us about your tipical queries, show us your configuration file.
> The access are only in read only mode or do you have concurrent writers
> and readers ? During peak hours your processors are tied to 100% ?
> What say the vmstat and the iostat ?
>
> May be you are not using indexes some where, or may be yes but the
> planner is not using it... In two words we needs other informations
> in order to help you.
>
>
>
> Regards
> Gaetano Mendola
>
>

I included all the files in attachments, which will hopefully cut down
on any replied to Emails. As for things like connection pooling, the
web server makes use of Apache::DBI to pool the connections for the Perl
scripts being driven on that server. For the sake of being thorough,
a quick 'apachectl status' was thrown in when the database was under a
good load.

Since it would rather slow things down to wait for the servers to really
get bogged down with load averages of 20.00 and more, I opted to choose
a period of time where we are a bit busier then normal. You will be
able to see how the system behaves under a light load and subsequently
reaching 125 or so concurrent connections.

The queries themselves are simple, normally drawing information from one
table with few conditions or in the most complex cases using joins on
two table or sub queries. These behave very well and always have, the
problem is that these queries take place in rather large amounts due to
the dumb nature of the scripts themselves.

Over a year ago when I was still using MySQL for the project, the
statistics generated would report well over 65 queries per second under
loads ranging from 130 to 160 at peak but averaged over the weeks of
operation. Looking at the Apache status, one can see that it averages
only roughly 2.5 requests per second giving you a slight indication as
to what is taking place.

A quick run of 'systat -ifstat' shows the following graph:

/0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
Load Average >>>>>>>>>>>

Interface Traffic Peak Total
lo0 in 0.000 KB/s 0.000 KB/s 37.690 GB
out 0.000 KB/s 0.000 KB/s 37.690 GB

em0 in 34.638 KB/s 41.986 KB/s 28.998 GB
out 70.777 KB/s 70.777 KB/s 39.553 GB

Em0 is a full duplexed 100Mbs connection to an internal switch that
supports the servers directly. Load on the loopback was cut down
considerably once I stopped using pg_autovaccum since its performance
benefits under low load were buried under the hindrance it caused when
traffic was high.

I am sure that there are some places that could benefit from some
optimization. Especially in the case of indexes, however as a whole the
problem seems to be related more to the massive onslaught of queries
then it does anything else.

Also note that some of these scripts run for longer durations even if
they are web based. Some run as long as 30 minutes, making queries to
the database from periods of wait from five seconds to twenty-five
seconds. Under high duress the timeouts should back out, based on
the time needed for the query to respond, normally averaging 0.008 seconds.

Does this help at all, or is more detail needed on the matter?

Martin Foster
martin(at)ethereal-realms(dot)org

Attachment Content-Type Size
postgresql.conf text/plain 7.3 KB
apache_load.txt text/plain 1.9 KB
pg_load.txt text/plain 3.6 KB
pg_norm.txt text/plain 3.7 KB

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Marius Andreiana 2004-08-04 05:40:52 Re: [PERFORM] NOT IN query takes forever
Previous Message Christopher Kings-Lynne 2004-08-04 01:32:53 Re: NOT IN query takes forever