Re: Sort and index

From: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dave Held <dave(dot)held(at)arrayservicesgrp(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sort and index
Date: 2005-04-23 01:54:04
Message-ID: 20050423015404.GV58835@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I've run some performance tests. The actual test case is at
http://stats.distributed.net/~decibel/timing.sql, and the results are at
http://stats.distributed.net/~decibel/timing.log. In a nutshell, doing
an index scan appears to be about 2x faster than a sequential scan and a
sort.

Something else of interest is that going from 50M of sort memory to 3G
sped the sort up by 900 seconds. If someone wants to record data about
the effect of sort_mem on on-disk sorts somewhere (maybe in the docs?) I
can run some more tests for that case.

In any case, it's clear that the planner is making the wrong choice
here. BTW, changing random_page_cost to 3 or 4 doesn't change the plan.

On Tue, Apr 19, 2005 at 10:40:41PM -0500, Jim C. Nasby wrote:
> On Tue, Apr 19, 2005 at 11:01:26PM -0400, Tom Lane wrote:
> > "Jim C. Nasby" <decibel(at)decibel(dot)org> writes:
> > > Actually, the planner (at least in 7.4) isn't smart enough to consider
> > > if the sort would fit in memory or not.
> >
> > Really? Have you read cost_sort()?
> >
> > It's certainly possible that the calculation is all wet, but to claim
> > that the issue is not considered is just wrong.
>
> To be fair, no, I haven't looked at the code. This is based strictly on
> anecdotal evidence on a 120M row table. I'm currently running a test to
> see how an index scan compares to a seqscan. I also got the same results
> when I added a where clause that would restrict it to about 7% of the
> table.
>
> Actually, after running some tests (below), the plan cost does change
> when I change sort_mem (it was originally 50000).
>
> stats=# \d email_contrib
> Table "public.email_contrib"
> Column | Type | Modifiers
> ------------+---------+-----------
> project_id | integer | not null
> id | integer | not null
> date | date | not null
> team_id | integer |
> work_units | bigint | not null
> Indexes:
> "email_contrib_pkey" primary key, btree (project_id, id, date)
> "email_contrib__pk24" btree (id, date) WHERE (project_id = 24)
> "email_contrib__pk25" btree (id, date) WHERE (project_id = 25)
> "email_contrib__pk8" btree (id, date) WHERE (project_id = 8)
> "email_contrib__project_date" btree (project_id, date)
> Foreign-key constraints:
> "fk_email_contrib__id" FOREIGN KEY (id) REFERENCES stats_participant(id) ON UPDATE CASCADE
> "fk_email_contrib__team_id" FOREIGN KEY (team_id) REFERENCES stats_team(team) ON UPDATE CASCADE
>
> stats=# explain select * from email_contrib where project_id=8 order by project_id, id, date;
> QUERY PLAN
> --------------------------------------------------------------------------------
> Sort (cost=3613476.05..3635631.71 rows=8862263 width=24)
> Sort Key: project_id, id, date
> -> Seq Scan on email_contrib (cost=0.00..2471377.50 rows=8862263 width=24)
> Filter: (project_id = 8)
> (4 rows)
>
> stats=# explain select * from email_contrib order by project_id, id, date;
> QUERY PLAN
> ----------------------------------------------------------------------------------
> Sort (cost=25046060.83..25373484.33 rows=130969400 width=24)
> Sort Key: project_id, id, date
> -> Seq Scan on email_contrib (cost=0.00..2143954.00 rows=130969400 width=24)
> (3 rows)
>
> stats=# select 8862263::float/130969400;
> ?column?
> --------------------
> 0.0676666687027657
> (1 row)
>
> stats=# explain select * from email_contrib where project_id=8 order by project_id, id, date;
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------
> Index Scan using email_contrib_pkey on email_contrib (cost=0.00..6832005.57 rows=8862263 width=24)
> Index Cond: (project_id = 8)
> (2 rows)
>
> stats=# explain select * from email_contrib order by project_id, id, date;
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------
> Index Scan using email_contrib_pkey on email_contrib (cost=0.00..100055905.62 rows=130969400 width=24)
> (1 row)
>
> stats=# set enable_seqscan=on;
> SET
> stats=# set sort_mem=1000;
> SET
> stats=# explain select * from email_contrib order by project_id, id, date;
> QUERY PLAN
> ----------------------------------------------------------------------------------
> Sort (cost=28542316.63..28869740.13 rows=130969400 width=24)
> Sort Key: project_id, id, date
> -> Seq Scan on email_contrib (cost=0.00..2143954.00 rows=130969400 width=24)
> (3 rows)
>
> stats=#
>
> --
> Jim C. Nasby, Database Consultant decibel(at)decibel(dot)org
> Give your computer some brain candy! www.distributed.net Team #1828
>
> Windows: "Where do you want to go today?"
> Linux: "Where do you want to go tomorrow?"
> FreeBSD: "Are you guys coming, or what?"
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>

--
Jim C. Nasby, Database Consultant decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jim C. Nasby 2005-04-23 02:01:51 Re: Joel's Performance Issues WAS : Opteron vs Xeon
Previous Message Josh Berkus 2005-04-22 22:33:26 Re: Updating table, precautions?