From: | Bryan Hinton <bryan(at)bryanhinton(dot)com> |
---|---|
To: | David Jarvis <thangalin(at)gmail(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Random Page Cost and Planner |
Date: | 2010-05-27 21:40:12 |
Message-ID: | AANLkTilLahnsIGdmBYjs7Q7uuCspoRXRg6cEGA18PDPx@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Agree with Tom on his point about avoidance of cost param adjustments to fit
specific test cases.
A few suggestions...as I assume you own this database...
- check out pg_statio_user_tables - optimize your cache hit ratio on blocks
read...different time durations... pg_stat_bgwriter (read from a script or
something and snapshot)
- pg_buffercache in contrib/
- /proc/meminfo on linux
- find out exactly what is going on with your kernel buffer cache (size, how
it is buffering) and if your controller or drive is using a read ahead
cache.
- might want to play around with partial indexes vs. and/or range
partitioning with exclusion constraints, etc.
- define I/O characteristics of the dataset - taking into account index
clustering and index order on in-memory pages (i.e. re-cluster?), why need
for multiple index if clustering indexes on heap?
- solidify the referential integrity constraints between those tables, on
paper....define the use cases before modifying the database tables...i
assume this is a dev database
- linux fs mount options to explore - i.e. noatime, writeback, etc.
-maybe look at prepared statements if you are running alot of similar
queries from a single session? assuming web front end for your db - with say
frequently queried region/category/dates for large read-only dataset with
multiple join conditions?
There are some good presentations on pgcon.org from PGCon 2010 that was held
last week...
http://www.pgcon.org/2010/schedule/events/218.en.html
If you take everything into account and model it correctly (not too loose,
not too tight), your solution will be reusable and will save time and
hardware expenses.
Regards -
Bryan
On Thu, May 27, 2010 at 2:43 AM, David Jarvis <thangalin(at)gmail(dot)com> wrote:
> Hi, Bryan.
>
> I was just about to reply to the thread, thanks for asking. Clustering was
> key. After rebooting the machine (just to make sure absolutely nothing was
> cached), I immediately ran a report on Toronto: 5.25 seconds!
>
> Here's what I did:
>
> 1. Created a new set of tables that matched the old set, with
> statistics of 1000 on the station and taken (date) columns.
> 2. Inserted the data from the old hierarchy into the new set, ordered
> by station id then by date (same seven child tables as before: one per
> category).
> - I wanted to ensure a strong correlation between primary key and
> station id.
> 3. Added three indexes per table: (a) station id; (b) date taken;
> and (c) station-taken-category.
> 4. Set the station-taken-category index as CLUSTER.
> 5. Vacuumed the new tables.
> 6. Dropped the old tables.
> 7. Set the following configuration values:
> - shared_buffers = 1GB
> - temp_buffers = 32MB
> - work_mem = 32MB
> - maintenance_work_mem = 64MB
> - seq_page_cost = 1.0
> - random_page_cost = 2.0
> - cpu_index_tuple_cost = 0.001
> - effective_cache_size = 512MB
>
> I ran a few more reports (no reboots, but reading vastly different data
> sets):
>
> - Vancouver: 4.2s
> - Yellowknife: 1.7s
> - Montreal: 6.5s
> - Trois-Riviers: 2.8s
>
> No full table scans. I imagine some indexes are not strictly necessary and
> will test to see which can be removed (my guess: the station and taken
> indexes). The problem was that the station ids were scattered and so
> PostgreSQL presumed a full table scan would be faster.
>
> Physically ordering the data by station ids triggers index use every time.
>
> Next week's hardware upgrade should halve those times -- unless anyone has
> further suggestions to squeeze more performance out of PG. ;-)
>
> Dave
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Craig James | 2010-05-27 21:44:19 | Re: Does FILTER in SEQSCAN short-circuit AND? |
Previous Message | Kevin Grittner | 2010-05-27 21:28:48 | Re: Does FILTER in SEQSCAN short-circuit AND? |