Re: Join between 2 tables always executes a sequential scan on the larger table

From: Vasilis Ventirozos <v(dot)ventirozos(at)gmail(dot)com>
To: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Join between 2 tables always executes a sequential scan on the larger table
Date: 2013-04-03 08:50:06
Message-ID: CAF8jcqp2yAibEFhh-y=_1mmf718psq2b_JzzSZ8r6Ksy6SQpXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello Dieter,
If you are asking more than about 20% of the rows the optimizer will choose
to do a seq scan and actually that's the right thing to do. On the second
example of yours the rows here less and that's why it chose to go with the
index.
you can force an index scan by changing the optimizer parameters as other
guys already mentioned

Vasilis Ventirozos

On Wed, Apr 3, 2013 at 11:18 AM, Dieter Rehbein
<dieter(dot)rehbein(at)skiline(dot)cc>wrote:

> Hi Igor,
>
> thanks for the reply. The sequential scan on user_2_competition wasn't my
> main-problem. What really suprised me was the sequential scan on table
> user, which is a sequential scan over one million rows.
>
> Hash Left Join (cost=111357.64..126222.29 rows=41396 width=42) (actual
> time=1982.543..2737.331 rows=41333 loops=1)
> Hash Cond: ((uc.user_id)::text = (u.id)::text)
> -> Seq Scan on user_2_competition uc (cost=0.00..4705.21 rows=41396
> width=33) (actual time=0.019..89.691 rows=41333 loops=1)
> Filter: ((competition_id)::text =
> '3cc1cb9b3ac132ad013ad01316040001'::text)
> Rows Removed by Filter: 80684
> -> Hash (cost=90074.73..90074.73 rows=999673 width=42) (actual
> time=1977.604..1977.604 rows=999673 loops=1)
> Buckets: 2048 Batches: 128 Memory Usage: 589kB
> -> Seq Scan on "user" u (cost=0.00..90074.73 rows=999673
> width=42) (actual time=0.004..1178.827 rows=999673 loops=1) <-- This
> sequential scan is strange.
>
>
> IMHO the reason for the sequential scan on user is, that it is faster than
> an index-scan for 41333 rows. I've tried the same query using a different
> competition id with much less participants (about 1700). That query has a
> query plan as expected:
>
> Nested Loop Left Join (cost=0.00..21385.59 rows=1684 width=42) (actual
> time=1.317..147.781 rows=1757 loops=1)
> -> Seq Scan on user_2_competition uc (cost=0.00..7026.93 rows=1684
> width=33) (actual time=1.262..92.339 rows=1757 loops=1)
> Filter: ((competition_id)::text =
> '3cc1cb963b988f12013bc737b4590001'::text)
> -> Index Scan using user_pkey on "user" u (cost=0.00..8.51 rows=1
> width=42) (actual time=0.030..0.031 rows=1 loops=1757)
> Index Cond: ((id)::text = (uc.user_id)::text)
> Total runtime: 148.068 ms
>
>
> regards
> Dieter
>
>
>
> Am 02.04.2013 um 16:55 schrieb Igor Neyman <ineyman(at)perceptron(dot)com>:
>
> From: Dieter Rehbein [mailto:dieter(dot)rehbein(at)skiline(dot)cc]
> Sent: Tuesday, April 02, 2013 4:52 AM
> To: pgsql-performance(at)postgresql(dot)org
> Subject: Join between 2 tables always executes a sequential scan on the
> larger table
>
> Hi everybody,
>
> in a project I have a performance problem, which I (and my colleagues)
> don't understand. It's a simple join between 2 of 3 tables:
>
> table-1: user (id, user_name, ...). This table has about 1 million
> rows (999673 rows)
> table-2: competition (57 rows)
> table-3: user_2_competition. A relation between user and competition.
> This table has about 100.000 rows
>
> The query is a join between table user_2_competition and user and looks
> like this:
>
> select u.id, u.user_name
> from user_2_competition uc
> left join "user" u on u.id = uc.user_id
> where uc.competition_id = '3cc1cb9b3ac132ad013ad01316040001'
>
> The query returns the ID and user_name of all users participating in a
> competition.
>
> What I don't understand: This query executes a sequential scan on user!
>
>
> The tables have the following indexes:
>
> user_2_competition: there is an index on user_id and an index on
> competition_id (competition_id is a VARCHAR(32) containing UUIDs)
> user: id is the primary key and has therefore a unique index (the ID is a
> VARCHAR(32), which contains UUIDs).
>
> The database has just been restored from a backup, I've executed ANALYZE
> for both tables.
>
> The output of explain analyze (Postgres 9.2.3):
>
> Hash Left Join (cost=111357.64..126222.29 rows=41396 width=42) (actual
> time=1982.543..2737.331 rows=41333 loops=1)
> Hash Cond: ((uc.user_id)::text = (u.id)::text)
> -> Seq Scan on user_2_competition uc (cost=0.00..4705.21 rows=41396
> width=33) (actual time=0.019..89.691 rows=41333 loops=1)
> Filter: ((competition_id)::text =
> '3cc1cb9b3ac132ad013ad01316040001'::text)
> Rows Removed by Filter: 80684
> -> Hash (cost=90074.73..90074.73 rows=999673 width=42) (actual
> time=1977.604..1977.604 rows=999673 loops=1)
> Buckets: 2048 Batches: 128 Memory Usage: 589kB
> -> Seq Scan on "user" u (cost=0.00..90074.73 rows=999673
> width=42) (actual time=0.004..1178.827 rows=999673 loops=1)
> Total runtime: 2740.723 ms
>
>
> I expected to see an index-scan on user_2_competition with a hash join to
> user, not a sequential scan on user. I've tried this with Postgres 9.1 and
> 9.2.3).
>
> Any ideas, what's going on here?
>
> With EXPLAIN ANALYZE I can see, which query plan Postgres is using. Is
> there any way to find out, WHY postgres uses this query plan?
>
> best regards
> Dieter
>
> -----------------------------------------------
>
> Dieter,
> why do you think index-scan on user_2_competition would be better?
>
> Based on huge number of rows returned (41333 out of total ~120000 in the
> table) from this table optimizer decided that Seq Scan is better than index
> scan.
> You don't show QUERY TUNING parameters from Postgresql.conf, are they
> default?
> Playing with optimizer parameters (lowering random_page_cost, lowering
> cpu_index_tuple_cost , increasing effective_cache_size, or just setting
> enable_seqscan = off), you could try to force "optimizer" to use index, and
> see if you are getting better results.
>
> Regards,
> Igor Neyman
>
> Happy Skiing!
>
> Dieter Rehbein
> Software Architect | dieter(dot)rehbein(at)skiline(dot)cc
>
> Skiline Media GmbH
> Lakeside B03
> 9020 Klagenfurt, Austria
>
> fon: +43 463 249445-800
> fax: +43 463 249445-102
>
> "Erlebe Skifahren neu!"
>
> CONFIDENTIALITY: This e-mail and any attachments are confidential and may
> also be privileged. If you are not the designated recipient, please notify
> the sender immediately by reply e-mail and destroy all copies (digital and
> paper). Any unauthorized disclosure, distribution, copying, storage or use
> of the information contained in this e-mail or any attachments is strictly
> prohibited and may be unlawful.
> LEGAL: Skiline Media GmbH - Managing Director: Michael Saringer
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2013-04-03 15:34:33 Re: Planner is getting wrong row count
Previous Message Dieter Rehbein 2013-04-03 08:24:54 Re: Join between 2 tables always executes a sequential scan on the larger table