Quick Links

Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance

From:	Mario Splivalo <mario(dot)splivalo(at)megafon(dot)hr>
To:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance
Date:	2009-04-06 12:20:47
Message-ID:	49D9F39F.40106@megafon.hr
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Scott Marlowe wrote:
>
> It's not really solved, it's just a happy coincidence that the current
> plan runs well. In order to keep the query planner making good
> choices you need to increase stats target for the field in the index
> above. The easiest way to do so is to do this:
>
> alter database mydb set default_statistics_target=100;
>
> and run analyze again:
>
> analyze;

So, i removed the index on field_name, set
default_default_statistics_target to 100, analyzed, and the results are
the same:

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=1.04..58676.31 rows=218048 width=68) (actual
time=0.067..12268.394 rows=6 loops=1)
Hash Cond: ((u.field_name)::text = (t.key)::text)
-> Seq Scan on photo_info_data u (cost=0.00..47500.30 rows=2398530
width=50) (actual time=0.013..6426.611 rows=2398446 loops=1)
-> Hash (cost=1.02..1.02 rows=2 width=18) (actual
time=0.015..0.015 rows=2 loops=1)
-> Seq Scan on t_query_data t (cost=0.00..1.02 rows=2
width=18) (actual time=0.002..0.006 rows=2 loops=1)
Total runtime: 12268.459 ms
(6 rows)

I even changed default_statistics_target to 1000:

------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=1.04..58580.29 rows=208561 width=67) (actual
time=0.054..12434.283 rows=6 loops=1)
Hash Cond: ((u.field_name)::text = (t.key)::text)
-> Seq Scan on photo_info_data u (cost=0.00..47499.46 rows=2398446
width=49) (actual time=0.012..6129.923 rows=2398446 loops=1)
-> Hash (cost=1.02..1.02 rows=2 width=18) (actual
time=0.015..0.015 rows=2 loops=1)
-> Seq Scan on t_query_data t (cost=0.00..1.02 rows=2
width=18) (actual time=0.002..0.004 rows=2 loops=1)
Total runtime: 12434.338 ms
(6 rows)

Even when I run this query, I get sequential scan:

explain analyze select * from photo_info_data where field_name =
'f-spot' or field_name = 'shutter';

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------
Seq Scan on photo_info_data (cost=0.00..59491.69 rows=1705 width=49)
(actual time=0.018..1535.963 rows=6 loops=1)
Filter: (((field_name)::text = 'f-spot'::text) OR
((field_name)::text = 'shutter'::text))
Total runtime: 1536.010 ms
(3 rows)

These are the representations of te values 'f-spot' and 'shutter' for
the field field_name in photo_info_data table:

xmltest=# select field_name, count(*) from user_info_data where
field_name in ('visina', 'spol') group by field_name;
field_name | count
------------+-------
'f-spot' | 3
'shutter' | 3
(2 rows)

Maybe my test-data is poor? As I've mentioned, photo_info_data has
little over 2300000 rows. And this is complete 'distribution' of the data:

No matter what field_name value I enter in WHERE condition, planner
chooses sequential scan. Only when I add seperate index on field_name,
planner chooes index scan or bitmap index scan.

Mike

In response to

Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance at 2009-03-30 16:25:36 from Scott Marlowe

Responses

Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance at 2009-04-06 13:58:05 from Scott Marlowe

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Matthew Wakeling	2009-04-06 12:24:48	Re: difficulties with time based queries
Previous Message	Matthew Wakeling	2009-04-06 11:47:40	Re: plpgsql arrays