Re: strange performance problem (SOLVED)

From: Linos <info(at)linos(dot)es>
To: Richard Huxton <dev(at)archonet(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: strange performance problem (SOLVED)
Date: 2009-03-01 17:04:25
Message-ID: 49AAC019.8050307@linos.es
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Linos escribió:
> Linos escribió:
>> Richard Huxton escribió:
>>> Linos wrote:
>>>> Richard Huxton escribió:
>>>>> Linos wrote:
>>>>>> 2009-02-27 13:51:15 CET 127.0.0.1LOG: duración: 4231.045 ms
>>>>>> sentencia:
>>>>>> SELECT "nombre", "subfamilia_id", "id_familia", "hasta", "foto",
>>>>>> "id_seccion", "id_categoria" FROM "modelo_subfamilia"
>>>>>> PSQL with \timing:
>>>>>> -development: Time: 72,441 ms
>>>>>> -server: Time: 78,762 ms
>>>>>> but if i load it from QT or from pgadmin i get more than 4 seconds in
>>>>>> server and ~100ms in develoment machime, if i try the query
>>>>>> without the
>>>>>> "foto" column i get 2ms in development and 30ms in server
>>>>> OK, so:
>>>>> 1. No "foto" - both quick
>>>>> 2. psql + "foto" - both slow
>>>>> 3. QT + "foto" - slow only on server
>>>> 1.No "foto" -both quick but still a noticeable
>>>> difference between them 2ms develoment - 30ms server
>>>> 2. psql + "foto" -both quick really, they are about
>>>> 70ms,
>>>> not bad giving that foto are bytea with small png images.
>>>
>>> Ah, sorry - I read the psql timings as 72 thousand ms (72 seconds) - of
>>> course you're using European decimal marks.
>>>
>>>> 3. QT or WXWindows + "foto" -slow only one server yes.
>>>>
>>>>> The bit that puzzles me is why both are slow in #2 and not in #3.
>>>
>>> OK - well, the fact that both psql are fast means there's nothing too
>>> wrong with your setup. It must be something to do with the application
>>> libraries.
>>>
>>>> After the vacuum full verbose and reindex still the same problem (i had
>>>> tried the vacuum before).
>>>
>>> OK. Worth ruling it out.
>>>
>>>> 1- The same in the two machines, tcp/ip with localhost.
>>>
>>> Hmm...
>>>
>>>> 2- I am exactly the same code in the two machines and the same pgadmin3
>>>> version too.
>>>
>>> Good. We can rule that out.
>>>
>>>> 3- Ever the entire result set.
>>>
>>> Good.
>>>
>>>> 4- I am using es_ES.UTF8 in the two machines
>>>
>>> Good.
>>>
>>>> What can be using wxwindows and QT to access postgresql that psql it is
>>>> not using, libpq?
>>>
>>> Well, I'm pretty sure that pgadmin will be using libpq at some level,
>>> even if there is other code above it.
>>>
>>> Either:
>>>
>>> 1. One machine (the fast one) is actually using unix sockets and not
>>> tcp/ip+localhost like you think.
>>> 2. The networking setup is different on each.
>>> 3. Something your code is doing with the bytea data is slower on one
>>> machine than another. I seem to remember that pgadmin used to be quite
>>> slow at displaying large amounts of data. They did some work on that,
>>> but it might be that your use-case still suffers from it.
>>>
>>> For #1 try the psql test again, but with "-h localhost" and "-h /tmp"
>>> (or whatever directory your unix socket is in - might be
>>> /var/run/postgresql or similar too).
>>
>> Ok, thanks for the trick now i know where to search, after trying with
>> -h localhost psql it is slow too in the server from 80,361 with
>> \timing to 4259,453 using -h localhost. Any ideas what can be the
>> problem here? i am going to make what you suggest and capture analyze
>> the traffic, after find the hole i have tried in other debian server
>> with the same kernel 2.6.26 and i have the same problem (my
>> development machine it is Arch Linux with 2.6.28).
>>
>> Regards,
>> Miguel Angel.
>>
>>> For #2, you can always try timing "psql -h localhost ... > /dev/null" on
>>> both machines. If you capture port 5432 with something like "tcpdump -w
>>> ip.dump host localhost and port 5432" you can then use wireshark to see
>>> exactly why it's slow.
>>>
>>> For #3, I guess you'd need to reduce your code to just fetching the data
>>> and time that. You may have already done this of course.
>>>
>>> HTH
>>>
>
> I have been testing with tcpdump but i dont see the problem in the
> traffic (aside from the fact that it gives big jumps in ms between
> packets of data, but i dont know why), i have tested on other debian
> machines with the same result, i have upgraded kernel to 2.6.28 and
> postgresql to 8.3.6 (equal versions of my Arch Linux Development
> machine), but still have the same problem:
>
> -query with \timing with "psql -d database" ~110ms
> -query with \timing with "psql -d database -h localhost" ~4400ms
>
> Using tcp the cpu of postgresql spike to the max it can borrow within
> the query. I have attached the tcpdump logs of a debian machine and the
> Arch too (maybe anyone can see anything in them that i can not). How i
> can test pure speed in the loopback interface? i have tried iperf but
> seems to be cpu bound so maybe the results are misleading.
>

Okay, i have found the problem, in postgresql.conf the parameter "ssl = true"
seems to slow the clear tcp connections (not ssl enabled) very very much, but
this does not affect my arch Linux machine, only debian ones so i will contact
debian package maintainer so they can investigate it, thanks for the help.

Regards,
Miguel Angel.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Linos 2009-03-01 19:02:10 Re: strange performance problem (SOLVED)
Previous Message Sim Zacks 2009-03-01 15:21:16 Re: function to return rows as columns?