Re: Planner incorrectly choosing seq scan over index scan

From: Meetesh Karia <meetesh(dot)karia(at)gmail(dot)com>
To: John Arbash Meinel <john(at)arbash-meinel(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Planner incorrectly choosing seq scan over index scan
Date: 2005-08-01 23:56:13
Message-ID: fc5b04ca05080116564d1b3323@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks John. I've answered your questions below:

Has lte_user and candidates617004 been recently ANALYZEd? All estimates,
> except for the expected number of rows from lte_user seem to be okay.

I ANALYZEd both tables just before putting together my first email. And,
unfortunately, modifying the statistics target didn't help either.

Is user_id the primary key for lte_user?

Yes

I'm trying to figure out how you can get 50k rows, by searching a
> primary key, against a 3.5k rows. Is user_id only part of the primary
> key for lte_user?

Hmmm ... I missed that before. But, that surprises me too. Especially since
sourceId in the candidates table has only 1 value. Also, user_id is the
complete primary key for lte_user.

Can you give us the output of:
> \d lte_user
> \d candidates617004

Sure, here they are:

lte=# \d lte_user
Table "public.lte_user"
Column | Type | Modifiers
---------------+-----------------------------+-----------
user_id | bigint | not null
firstname | character varying(255) |
lastname | character varying(255) |
address1 | character varying(255) |
address2 | character varying(255) |
city | character varying(255) |
state | character varying(255) |
zip | character varying(255) |
phone1 | character varying(255) |
phone2 | character varying(255) |
username | character varying(255) |
password | character varying(255) |
deleted | boolean | not null
ext_cust_id | character varying(255) |
aboutme | character varying(255) |
birthday | timestamp without time zone |
fm_id | bigint |
ar | double precision |
Indexes:
"lte_user_pkey" PRIMARY KEY, btree (user_id)
"idx_user_extid" btree (ext_cust_id)
"idx_user_username" btree (username)
Foreign-key constraints:
"fk_user_fm" FOREIGN KEY (fm_id) REFERENCES fm(fm_id)

lte=# \d candidates617004
Table "public.candidates617004"
Column | Type | Modifiers
--------------+------------------+-----------
fmid | bigint |
sourceid | bigint |
sr | double precision |
targetid | bigint |
tr | double precision |

Also, if you could describe the table layouts, that would help.

Sure. The lte_user table is just a collection of users. user_id is assigned
uniquely using a sequence. During some processing, we create a candidates
table (candidates617004 in our case). This table is usually a temp table.
sourceid is a user_id (in this case it is always 617004) and targetid is
also a user_id (2860 distinct values out of 3467). The rest of the
information is either only used in the select clause or not used at all
during this processing.

Did I miss something in the table layout description that would be helpful?

Thanks for your help!
Meetesh

> -> Sort (cost=9912.07..9920.73 rows=3467 width=40) (actual
> > time=1156.000..1156.000 rows=3467 loops=1)
> > Sort Key: c.sourceid
> > -> Hash Join (cost=8710.44..9708.21 rows=3467 width=40)
> > (actual time=1125.000..1156.000 rows=3467 loops=1)
> > Hash Cond: ("outer".targetid = "inner".user_id)
> > -> Seq Scan on candidates617004 c
> > (cost=0.00..67.67 rows=3467 width=32) (actual time=0.000..0.000
> > rows=3467 loops=1)
> > -> Hash (cost=8011.95..8011.95 rows=279395
> > width=16) (actual time=1125.000..1125.000 rows=0 loops=1)
> > -> Seq Scan on lte_user t
> > (cost=0.00..8011.95 rows=279395 width=16) (actual time=0.000..670.000
> > rows=279395 loops=1)
> > Total runtime: 1406.000 ms
> >
> > enable_hashjoin disabled
> > ----------------------------------------
> > QUERY PLAN
> > Sort (cost=14355.37..14364.03 rows=3467 width=48) (actual
> > time=391.000..391.000 rows=3467 loops=1)
> > Sort Key: c.sourceid, c.targetid
> > -> Nested Loop (cost=271.52..14151.51 rows=3467 width=48) (actual
> > time=203.000..359.000 rows=3467 loops=1)
> > -> Merge Join (cost=271.52..3490.83 rows=3467 width=40)
> > (actual time=203.000..218.000 rows=3467 loops=1)
> > Merge Cond: ("outer".user_id = "inner".sourceid)
> > -> Index Scan using lte_user_pkey on lte_user s
> > (cost=0.00..16837.71 rows=279395 width=16) (actual time=0.000..126.000
> > rows=50034 loops=1)
> > -> Sort (cost=271.52..280.19 rows=3467 width=32) (actual
> > time=15.000..30.000 rows=3467 loops=1)
> > Sort Key: c.sourceid
> > -> Seq Scan on candidates617004 c
> > (cost=0.00..67.67 rows=3467 width=32) (actual time=0.000..0.000
> > rows=3467 loops=1)
> > -> Index Scan using lte_user_pkey on lte_user t
> > (cost=0.00..3.03 rows=1 width=16) (actual time=0.031..0.036 rows=1
> > loops=3467)
> > Index Cond: ("outer".targetid = t.user_id)
> > Total runtime: 406.000 ms
> >
> > random_page_cost set to 1.5
> > ----------------------------------------------
> > QUERY PLAN
> > Sort (cost=12702.62..12711.29 rows=3467 width=48) (actual
> > time=1407.000..1407.000 rows=3467 loops=1)
> > Sort Key: c.sourceid, c.targetid
> > -> Merge Join (cost=9912.07..12498.77 rows=3467 width=48) (actual
> > time=1391.000..1407.000 rows=3467 loops=1)
> > Merge Cond: ("outer".user_id = "inner".sourceid)
> > -> Index Scan using lte_user_pkey on lte_user s
> > (cost=0.00..12807.34 rows=279395 width=16) (actual time=0.000..46.000
> > rows=50034 loops=1)
> > -> Sort (cost=9912.07..9920.73 rows=3467 width=40) (actual
> > time=1188.000..1188.000 rows=3467 loops=1)
> > Sort Key: c.sourceid
> > -> Hash Join (cost=8710.44..9708.21 rows=3467 width=40)
> > (actual time=1157.000..1188.000 rows=3467 loops=1)
> > Hash Cond: ("outer".targetid = "inner".user_id)
> > -> Seq Scan on candidates617004 c
> > (cost=0.00..67.67 rows=3467 width=32) (actual time=0.000..15.000
> > rows=3467 loops=1)
> > -> Hash (cost=8011.95..8011.95 rows=279395
> > width=16) (actual time=1157.000..1157.000 rows=0 loops=1)
> > -> Seq Scan on lte_user t
> > (cost=0.00..8011.95 rows=279395 width=16) (actual time=0.000..750.000
> > rows=279395 loops=1)
> > Total runtime: 1422.000 ms
> >
> > random_page_cost set to 1.5 and enable_hashjoin set to false
> >
> --------------------------------------------------------------------------------------------------
> > QUERY PLAN
> > Sort (cost=13565.58..13574.25 rows=3467 width=48) (actual
> > time=390.000..390.000 rows=3467 loops=1)
> > Sort Key: c.sourceid, c.targetid
> > -> Nested Loop (cost=271.52..13361.73 rows=3467 width=48) (actual
> > time=203.000..360.000 rows=3467 loops=1)
> > -> Merge Join (cost=271.52..2762.88 rows=3467 width=40)
> > (actual time=203.000..250.000 rows=3467 loops=1)
> > Merge Cond: ("outer".user_id = "inner".sourceid)
> > -> Index Scan using lte_user_pkey on lte_user s
> > (cost=0.00..12807.34 rows=279395 width=16) (actual time=0.000..48.000
> > rows=50034 loops=1)
> > -> Sort (cost=271.52..280.19 rows=3467 width=32) (actual
> > time=15.000..31.000 rows=3467 loops=1)
> > Sort Key: c.sourceid
> > -> Seq Scan on candidates617004 c
> > (cost=0.00..67.67 rows=3467 width=32) (actual time=0.000..15.000
> > rows=3467 loops=1)
> > -> Index Scan using lte_user_pkey on lte_user t
> > (cost=0.00..3.02 rows=1 width=16) (actual time=0.023..0.023 rows=1
> > loops=3467)
> > Index Cond: ("outer".targetid = t.user_id)
> > Total runtime: 406.000 ms
> >
> > Thanks,
> > Meetesh
>
>
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2005-08-02 00:15:26 Re: Planner incorrectly choosing seq scan over index scan
Previous Message Tom Lane 2005-08-01 23:51:10 Re: [PERFORM] COPY FROM performance improvements