Re: select on 1milion register = 6s

From: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To: "Bruno Rodrigues Siqueira" <bruno(at)ravnus(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: select on 1milion register = 6s
Date: 2007-07-29 04:51:41
Message-ID: dcc563d10707282151m7a2de11ex5f593cfc19b98095@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 7/28/07, Bruno Rodrigues Siqueira <bruno(at)ravnus(dot)com> wrote:
>
> Ok.
> QUERY PLAN
> Sort (cost=11449.37..11449.40 rows=119 width=8) (actual
> time=14431.537..14431.538 rows=2 loops=1)
> Sort Key: to_char(data_encerramento, 'yyyy-mm'::text)
> -> HashAggregate (cost=11448.79..11448.96 rows=119 width=8) (actual
> time=14431.521..14431.523 rows=2 loops=1)
> -> Index Scan using detalhamento_bas_idx3003 on detalhamento_bas
> (cost=0.00..11442.95 rows=11679 width=8) (actual time=0.135..12719.155
> rows=2335819 loops=1)

See the row mismatch there? It expects about 11k rows, gets back 2.3
million. That's a pretty big misestimate. Have you run analyze
recently on this table?

Is there a reason you're doing this:

to_char( data_encerramento ,'yyyy-mm')
between '2006-12' and '2007-01'

when you should be able to just do:

data_encerramento between '2006-12-01' and '2007-01-31'
? that should be able to use good estimates from analyze. My guess
is the planner is making a bad guess because of the way you're
handling the dates.

> SERVER
> DELL PowerEdge 2950
> XEON Quad-Core 3.0Ghz
> 4Gb RAM
> Linux CentOS 5.0 64-bits
> Postgres 8.1.4

> Postgresql.conf
> # - Memory -
>
> shared_buffers = 50000 # min 16 or max_connections*2, 8KB

400 Meg is kind of low for a server with 4 G ram. 25% is more
reasonable (i.e. 125000 buffers)

> work_mem = 3145728 # min 64, size in KB
> maintenance_work_mem = 4194304 # min 1024, size in KB

Whoa nellie! thats ~ 3 Gig of work mem, and 4 gig of maintenance work
mem. In a machine with 4 gig ram, that's a recipe for disaster.

Something more reasonable would be 128000 (~125Meg) for each since
you've limited your machine to 10 connections you should be ok.
setting work_mem too high can run your machine out of memory and into
a swap storm that will kill performance.

> fsync = off # turns forced synchronization on or
> off

So, the data in this database isn't important? Cause that's what
fsync = off says to me. Better to buy yourself a nice battery backed
caching RAID controller than turn off fsync.

> effective_cache_size = 41943040 # typically 8KB each

And you're machine has 343,604,830,208 bytes of memory available for
caching? Seems a little high to me.

> random_page_cost = 1 # units are one sequential page
> fetch

Seldom if ever is it a good idea to bonk the planner on the head with
random_page_cost=1. setting it to 1.2 ot 1.4 is low enough, but 1.4
to 2.0 is more realistic.

> stats_start_collector = off
> #stats_command_string = off
> #stats_block_level = off
> #stats_row_level = off
> #stats_reset_on_server_start = off

I think you need stats_row_level on for autovacuum, but I'm not 100% sure.

Let us know what happens after fixing these settings and running
analyze and running explain analyze, with possible changes to the
query.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jay Kang 2007-07-29 14:22:30 Questions on Tags table schema
Previous Message Bruno Rodrigues Siqueira 2007-07-28 23:07:18 RES: select on 1milion register = 6s