| From: | "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com> | 
|---|---|
| To: | "Bruno Rodrigues Siqueira" <bruno(at)ravnus(dot)com> | 
| Cc: | pgsql-performance(at)postgresql(dot)org | 
| Subject: | Re: select on 1milion register = 6s | 
| Date: | 2007-07-29 04:51:41 | 
| Message-ID: | dcc563d10707282151m7a2de11ex5f593cfc19b98095@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-performance | 
On 7/28/07, Bruno Rodrigues Siqueira <bruno(at)ravnus(dot)com> wrote:
>
> Ok.
> QUERY PLAN
> Sort  (cost=11449.37..11449.40 rows=119 width=8) (actual
> time=14431.537..14431.538 rows=2 loops=1)
>   Sort Key: to_char(data_encerramento, 'yyyy-mm'::text)
>   ->  HashAggregate  (cost=11448.79..11448.96 rows=119 width=8) (actual
> time=14431.521..14431.523 rows=2 loops=1)
>         ->  Index Scan using detalhamento_bas_idx3003 on detalhamento_bas
> (cost=0.00..11442.95 rows=11679 width=8) (actual time=0.135..12719.155
> rows=2335819 loops=1)
See the row mismatch there?  It expects about 11k rows, gets back 2.3
million.  That's a pretty big misestimate.  Have you run analyze
recently on this table?
Is there a reason you're doing this:
to_char( data_encerramento ,'yyyy-mm')
between   '2006-12' and  '2007-01'
when you should be able to just do:
data_encerramento between   '2006-12-01' and  '2007-01-31'
?  that should be able to use good estimates from analyze.  My guess
is the planner is making a bad guess because of the way you're
handling the dates.
>         SERVER
>                   DELL PowerEdge 2950
>                   XEON Quad-Core 3.0Ghz
>                   4Gb RAM
>                   Linux CentOS 5.0 64-bits
>      Postgres 8.1.4
>      Postgresql.conf
> # - Memory -
>
> shared_buffers = 50000                  # min 16 or max_connections*2, 8KB
400 Meg is kind of low for a server with 4 G ram.  25% is more
reasonable (i.e. 125000 buffers)
> work_mem = 3145728                      # min 64, size in KB
> maintenance_work_mem = 4194304          # min 1024, size in KB
Whoa nellie!  thats ~ 3 Gig of work mem, and 4 gig of maintenance work
mem.  In a machine with 4 gig ram, that's a recipe for disaster.
Something more reasonable would be 128000 (~125Meg) for each since
you've limited your machine to 10 connections you should be ok.
setting work_mem too high can run your machine out of memory and into
a swap storm that will kill performance.
> fsync = off                             # turns forced synchronization on or
> off
So, the data in this database isn't important?  Cause that's what
fsync = off says to me.  Better to buy yourself a nice battery backed
caching RAID controller than turn off fsync.
> effective_cache_size = 41943040 # typically 8KB each
And you're machine has 343,604,830,208 bytes of memory available for
caching?  Seems a little high to me.
> random_page_cost = 1                    # units are one sequential page
> fetch
Seldom if ever is it a good idea to bonk the planner on the head with
random_page_cost=1.  setting it to 1.2 ot 1.4 is low enough, but 1.4
to 2.0 is more realistic.
> stats_start_collector = off
> #stats_command_string = off
> #stats_block_level = off
> #stats_row_level = off
> #stats_reset_on_server_start = off
I think you need stats_row_level on for autovacuum, but I'm not 100% sure.
Let us know what happens after fixing these settings and running
analyze and running explain analyze, with possible changes to the
query.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jay Kang | 2007-07-29 14:22:30 | Questions on Tags table schema | 
| Previous Message | Bruno Rodrigues Siqueira | 2007-07-28 23:07:18 | RES: select on 1milion register = 6s |