Re: PostgreSQL Query Speed Issues

From: Joseph Pravato <joseph(dot)pravato(at)nomagic(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Tom Lisjac <netdxr(at)gmail(dot)com>, "pgsql-novice(at)postgresql(dot)org" <pgsql-novice(at)postgresql(dot)org>
Subject: Re: PostgreSQL Query Speed Issues
Date: 2013-02-22 22:11:20
Message-ID: 5127ED08.3020408@nomagic.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

On 2/22/2013 8:58 AM, Joseph Pravato wrote:

> ----- Forwarded Message -----
> From: "Kevin Grittner" <kgrittn(at)ymail(dot)com>
> To: "Tom Lisjac" <netdxr(at)gmail(dot)com>, pgsql-novice(at)postgresql(dot)org
> Sent: Friday, February 22, 2013 7:29:58 AM
> Subject: Re: [NOVICE] PostgreSQL Query Speed Issues
>
> Tom Lisjac <netdxr(at)gmail(dot)com> wrote:
>
>> version 9.2.1 running on Centos6/64
> There are performance problems in the 9.2 branch which are fixed in
> 9.2.3. If you care about performance, or for that matter running
> with known bugs fixed, upgrade.
>
> http://www.postgresql.org/support/versioning/

Our server manager is going to update to 9.2.3 this weekend to see if we
get any improvements.

>> The database is running in a Xen VM with 12GB of ram, 4 virtual
>> CPU's and fast, dedicated physical disks rather then shared
>> network storage. The problem queries take one core to saturation
>> and keep it there with very little disk I/O.
>> max_connections = 200
> You may want to consider connection pooling:
>
> http://wiki.postgresql.org/wiki/Number_Of_Database_Connections

On our server, we are using connection pooling with a maximum of 50. For
the moment, we've moved it down to max_connections = 100. Originally,
the 200 was because we were using a multi-threaded migration routine to
load data into Postgres. However, we've set up a second database
dedicated for migration and are using pg_dump & pg_restore.

>> shared_buffers = 2GB
> The usual advice is to start at 25% of machine RAM, up to 8GB, and
> check performance with incremental updates from there. That would
> suggest a 4GB starting point.
>
>> effective_cache_size = 1024MB
> This should normally be around 75% of machine RAM, so 9GB. This
> does not actually allocate any RAM, but is used to estimate how
> much of the indexes may be in RAM on repeated access. A larger
> number allows more index usage.

We actually only have 11.37GB on our server, so we've used 3072MB for
shared_buffers & 8400MB for effective_cache_size.

> In addition, with 12GB and apparently not more than 2 million rows
> per table, you seem very likely to have the active portion of your
> database fully cached. So these settings are likely to help:
>
> seq_page_cost = 0.1
> random_page_cost = 0.1
> cpu_tuple_cost = 0.03

Wow, the random_page_cost setting made a huge difference. The query we
mentioned yesterday that takes 30 minutes is down to about 30 seconds.
Nice speed improvement, reasonable speed, but still could use
improvements. However, I'm a bit worried about changing it so
drastically. Can there be any future issues from this change? We have 4
or 5 tables that will continue to grow fairly rapidly (1 million row
increase every 2-3 years).

> I normally don't set work_mem above RAM * 0.25 / max_connections,
> so I'd say 10MB to 15MB would probably be good with your
> max_connections setting. If you can lower that with a connection
> pool you might want to go to 20MB or 40MB.

For the moment, It is set to 30MB based on your formula and the new 100
connection limit.

> You probably want to boost maintenance_work_mem to something like
> 512MB.
>
> If you haven't already done so, run VACUUM ANALYZE at the database
> level. If most of your data was loaded at about the same time, run
> VACUUM FREEZE ANALYZE instead, to prevent a read-and-rewrite pass
> of your entire database at peak OLTP load. Vacuum and analyze are
> routine maintenance that are necessary for the database to perform
> well. Autovacuum can often handle everything for you, but if you
> have an "off-hours" period when load is lower it is often a good
> idea to run a database VACUUM ANALYZE on a daily or weekly basis to
> shift maintenance load to time when it has the least impact. Never
> leave autovacuum disabled beyond a short maintenance or load
> window.

The vacuuming didn't seem to change any performances, and it didn't take
very long to run. Autovacuum may have already taken care of most
potential issues from this.

> Regarding your specific problem...
>
> You claim this is how contact_address_map is defined:
>
> create table contact_address_map (
> contact id int8 not null,
> address_id int8 not null,
> unique (address_id)
> );
>
> ... but your query is looking for rows in that table where
> address_id is null, and finding them. Those can't both be true.

We are migrating a really old db to postgres and cleaning up all of the
data. In the old system, only one address was allowed per user and
therefore with the migrated data there is only going to be 1 entry in
the map for each user. We were looking for which users didn't have an
address at all and the query was a optimization with this assumption. We
are aware that the query is not ideal, but it works for the
just-after-migrating scenario for validating parts of the migration
routine. Do note that were were left joining the two tables so that all
contacts are guaranteed to return, only with a check on address_id to
see if they didn't have any addresses.

> Your plan shows an index scan with no index conditions to pass
> every row in a table, which is much slower than a seqscan. Did you
> turn off enable_seqscan and fail to mention that? Doing so will
> definitely result in sub-optimal performance on queries like the
> one shown, where every row in one of the tables must be read.

No, enable_seqscan is still turned on.

> When you don't provide accurate information, any advice you get may
> be off-target.
>

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Kevin Grittner 2013-02-23 16:53:27 Re: PostgreSQL Query Speed Issues
Previous Message Gavin Flower 2013-02-22 19:55:18 Re: minimum hardware requirements for small postgres db