From: | Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> |
---|---|
To: | Steve McLellan <smclellan(at)mintel(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Query performance over a large proportion of data |
Date: | 2009-03-10 21:57:40 |
Message-ID: | dcc563d10903101457o53a340bcw67531507ce921dac@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Tue, Mar 10, 2009 at 3:12 PM, Steve McLellan <smclellan(at)mintel(dot)com> wrote:
> Hi,
>
> I'd be grateful for any advice we can get... we recently switched from MySQL
> to PostgreSQL on the basis of some trials we carried out with datasets of
> varying sizes and varying rates of contention. For the most part we've been
> pleased with performance, but one particular application runs queries that
> pull back a lot of results across what is for us a large dataset. We've
> noticed enormous speed improvements over MySQL with complex queries, but some
> of these simpler queries are causing us some problems. We were hoping that the
> machine would be able to cache the majority of the database in memory and be
> able to run these kinds of queries very quickly. The box doesn't seem to be
> doing much I/O during these queries, and top usually reports the processor
> usage slowly increasing to about 75% but no higher than that (and then
> dropping once it's finished). We adjusted settings in postgresql.conf as
> recommended in various places on the web. In particular, experimentation led
> us to turn of enable_seq_scan, because it consistently led to faster query
> times, but we're not sure why that's the case or if it's a good idea
> generally.
No, it's not. The query planner in postgresql is quite good, and
unless you're sure it's making a pathologically bad decision, turning
on / off things like seqscan are kind of like a bullet in the brain to
cure a headache.
> This example has been anonymised slightly, although I've checked it for typos.
> Our 'fact_table' has 6 million rows, each of which joins to one of 1.7 million
> rows in record_dimension, and one of 15,000 rows in 'date_dimension'. We have
> other tables that also join to 'fact_table', but for this example these three
> tables suffice. The total size (as reported on the file system, presumably
> including indexes) is 7.5GB. The query below pulls 12 months' worth of data
> (accounting for 20% of the rows in 'fact_table') with restrictions that
> account for 15% of the rows in 'record_dimension'. It's a read-only database
> (we dump it fresh nightly).
>
> The server itself is a dual-core 3.7GHz Xeon Dell (each core reporting 2
> logical CPUs) running an amd64 build of FreeBSD 6.2, and postgres 8.3.5 built
> from source. It's got 400GB storage in RAID-5 (on 5 disks). It has 8GB of
> physical RAM. I'm able to use about 6GB of that for my own purposes; the
> server doesn't do much else but replicates a very low-usage mysql database.
> While it's running postgres only seems to use about 1.2GB of RAM.
What do you mean you're able to use about 6GB for your own purposes?
Note that postgresql relies on the OS to the majority of its caching
so if you're doing something that chews up 6G ram on the same machine
you are affecting pgsql performance on it.
I'll let someone else look through the explain analyze and all, but as
regards your sequential scan being turned off, you're far better off
adjusting the cost of seqscan and random_page_cost in postgresql.conf
to push the planner towards random access. Also increasing your
effective cache size up will favor index scans over sequential scans.
Then, use enable_seqscan=off / on to test if you have the best query
plan or not. Don't just leave enable_seqscan = off.
From | Date | Subject | |
---|---|---|---|
Next Message | Kevin Grittner | 2009-03-10 22:06:38 | Re: Query performance over a large proportion of data |
Previous Message | Steve McLellan | 2009-03-10 21:12:17 | Query performance over a large proportion of data |