Re: INDEX Performance Issue

From: Vasilis Ventirozos <v(dot)ventirozos(at)gmail(dot)com>
To: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: INDEX Performance Issue
Date: 2013-04-08 20:28:08
Message-ID: CAF8jcqr3b17tyjO6XBPwX+miJYG_=dDvbU2Qu9L87e_hxcJhVA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Apr 8, 2013 at 11:18 PM, Mark Davidson <mark(at)4each(dot)co(dot)uk> wrote:

> Wow my results are absolutely appalling compared to both of those which is
> really interesting. Are you running postgres 9.2.4 on both instances? Any
> specific configuration changes?
> Thinking there must be something up with my setup to be getting such a low
> tps compared with you.
>

Both installations are 9.2.4 and both db's have absolutely default
configurations, i can't really explain why there is so much difference
between our results, i can only imagine the initialization, thats why i
asked how you populated your pgbench database (scale factor / fill factor).

Vasilis Ventirozos

> On 8 April 2013 21:02, Vasilis Ventirozos <v(dot)ventirozos(at)gmail(dot)com> wrote:
>
>>
>> -c 10 means 10 clients so that should take advantage of all your cores
>> (see bellow)
>>
>> %Cpu0 : 39.3 us, 21.1 sy, 0.0 ni, 38.7 id, 0.9 wa, 0.0 hi, 0.0 si, 0.0 st
>> %Cpu1 : 38.0 us, 25.0 sy, 0.0 ni, 26.0 id, 4.2 wa, 0.0 hi, 6.8 si, 0.0 st
>> %Cpu2 : 39.3 us, 20.4 sy, 0.0 ni, 39.0 id, 1.3 wa, 0.0 hi, 0.0 si, 0.0 st
>> %Cpu3 : 40.0 us, 18.7 sy, 0.0 ni, 40.0 id, 1.3 wa, 0.0 hi, 0.0 si, 0.0 st
>> %Cpu4 : 13.9 us, 7.1 sy, 0.0 ni, 79.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
>> %Cpu5 : 13.1 us, 8.4 sy, 0.0 ni, 78.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
>> %Cpu6 : 14.8 us, 6.4 sy, 0.0 ni, 78.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
>> %Cpu7 : 15.7 us, 6.7 sy, 0.0 ni, 77.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
>>
>> i am pasting you the results of the same test on a i7-2600 16gb with a
>> sata3 SSD and the results from a VM with 2 cores and a normal 7200 rpm hdd
>>
>> -- DESKTOP
>> vasilis(at)Disorder ~ $ pgbench -c 10 -t 10000 bench
>> starting vacuum...end.
>> transaction type: TPC-B (sort of)
>> scaling factor: 1
>> query mode: simple
>> number of clients: 10
>> number of threads: 1
>> number of transactions per client: 10000
>> number of transactions actually processed: 100000/100000
>> tps = 1713.338111 (including connections establishing)
>> tps = 1713.948478 (excluding connections establishing)
>>
>> -- VM
>>
>> postgres(at)pglab1:~/postgresql-9.2.4/contrib/pgbench$ ./pgbench -c 10 -t
>> 10000 bench
>> starting vacuum...end.
>> transaction type: TPC-B (sort of)
>> scaling factor: 1
>> query mode: simple
>> number of clients: 10
>> number of threads: 1
>> number of transactions per client: 10000
>> number of transactions actually processed: 100000/100000
>> tps = 1118.976496 (including connections establishing)
>> tps = 1119.180126 (excluding connections establishing)
>>
>> i am assuming that you didn't populate your pgbench db with the default
>> values , if you tell me how you did i will be happy to re run the test and
>> see the differences.
>>
>>
>>
>> On Mon, Apr 8, 2013 at 10:31 PM, Mark Davidson <mark(at)4each(dot)co(dot)uk> wrote:
>>
>>> Thanks for your response Vasillis. I've run pgbench on both machines
>>> `./pgbench -c 10 -t 10000 pgbench` getting 99.800650 tps on my local
>>> machine and 23.825332 tps on the server so quite a significant difference.
>>> Could this purely be down to the CPU clock speed or is it likely
>>> something else causing the issue?
>>> I have run ANALYZE on both databases and tried the queries a number of
>>> times on each to make sure the results are consistent, this is the case.
>>>
>>>
>>> On 8 April 2013 18:19, Vasilis Ventirozos <v(dot)ventirozos(at)gmail(dot)com>wrote:
>>>
>>>>
>>>> Hello Mark,
>>>> PostgreSQL currently doesn't support parallel query so a faster cpu
>>>> even if it has less cores would be faster for a single query, about
>>>> benchmarking you can try pgbench that you will find in the contrib,
>>>> the execution plan may be different because of different statistics,
>>>> have you analyzed both databases when you compared the execution plans ?
>>>>
>>>> Vasilis Ventirozos
>>>>
>>>>
>>>> Been trying to progress with this today. Decided to setup the
>>>>> database on my local machine to try a few things and I'm getting much more
>>>>> sensible results and a totally different query plan
>>>>> http://explain.depesz.com/s/KGd in this case the query took about a
>>>>> minute but does sometimes take around 80 seconds.
>>>>>
>>>>> The config is exactly the same between the two database. The databases
>>>>> them selves are identical with all indexes the same on the tables.
>>>>>
>>>>> The server has an 2 x Intel Xeon E5420 running at 2.5Ghz each, 16GB
>>>>> RAM and the database is just on a SATA HDD which is a Western Digital
>>>>> WD5000AAKS.
>>>>> My desktop has a single i5-3570K running at 3.4Ghz, 16GB RAM and the
>>>>> database is running on a SATA HDD which is a Western Digital WD1002FAEX-0
>>>>>
>>>>> Could anyone offer any reasoning as to why the plan would be so
>>>>> different across the two machines? I would have thought that the server
>>>>> would perform a lot better since it has more cores or is postgres more
>>>>> affected by the CPU speed? Could anyone suggest a way to bench mark the
>>>>> machines for their postgres performance?
>>>>>
>>>>> Thanks again for everyones input,
>>>>>
>>>>> Mark
>>>>>
>>>>>
>>>>> On 7 April 2013 23:22, Mark Davidson <mark(at)4each(dot)co(dot)uk> wrote:
>>>>>
>>>>>> Takes a little longer with the INNER join unfortunately. Takes about
>>>>>> ~3.5 minutes, here is the query plan http://explain.depesz.com/s/EgBl.
>>>>>>
>>>>>>
>>>>>> With the JOIN there might not be a match if the data does not fall
>>>>>> within one of the areas that is selected in the IN query.
>>>>>>
>>>>>> So if we have data id (10) that might fall in areas ( 1, 5, 8, 167 )
>>>>>> but the user might be querying areas ( 200 ... 500 ) so no match in area
>>>>>> would be found just to be absolutely clear.
>>>>>>
>>>>>> Is it worth considering adding additional statistics on any of the
>>>>>> columns? And / Or additional INDEXES or different types INDEX? Would it be
>>>>>> worth restructuring the query starting with areas and working to join data
>>>>>> to that?
>>>>>>
>>>>>>
>>>>>> On 7 April 2013 16:15, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>>>>>>
>>>>>>> Greg Williamson <gwilliamson39(at)yahoo(dot)com> wrote:
>>>>>>>
>>>>>>> >> Thanks for your response. I tried doing what you suggested so
>>>>>>> >> that table now has a primary key of
>>>>>>> >> ' CONSTRAINT data_area_pkey PRIMARY KEY(area_id , data_id ); '
>>>>>>> >> and I've added the INDEX of
>>>>>>> >> 'CREATE INDEX data_area_data_id_index ON data_area USING btree
>>>>>>> (data_id );'
>>>>>>>
>>>>>>> Yeah, that is what I was suggesting.
>>>>>>>
>>>>>>> >> unfortunately it hasn't resulted in an improvement of the query
>>>>>>> >> performance.
>>>>>>>
>>>>>>> > Did you run analyze on the table after creating the index ?
>>>>>>>
>>>>>>> That probably isn't necessary. Statistics are normally on relations
>>>>>>> and columns; there are only certain special cases where an ANALYZE
>>>>>>> is needed after an index build, like if the index is on an
>>>>>>> expression rather than a list of columns.
>>>>>>>
>>>>>>> Mark, what happens if you change that left join to a normal (inner)
>>>>>>> join? Since you're doing an inner join to data_area and that has a
>>>>>>> foreign key to area, there should always be a match anyway, right?
>>>>>>> The optimizer doesn't recognize that, so it can't start from the
>>>>>>> area and just match to the appropriate points.
>>>>>>>
>>>>>>> --
>>>>>>> Kevin Grittner
>>>>>>> EnterpriseDB: http://www.enterprisedb.com
>>>>>>> The Enterprise PostgreSQL Company
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2013-04-08 20:39:23 Re: INDEX Performance Issue
Previous Message Mark Davidson 2013-04-08 20:18:52 Re: INDEX Performance Issue