From: | Mario Splivalo <mario(dot)splivalo(at)megafon(dot)hr> |
---|---|
To: | Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance |
Date: | 2009-04-06 12:20:47 |
Message-ID: | 49D9F39F.40106@megafon.hr |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Scott Marlowe wrote:
>
> It's not really solved, it's just a happy coincidence that the current
> plan runs well. In order to keep the query planner making good
> choices you need to increase stats target for the field in the index
> above. The easiest way to do so is to do this:
>
> alter database mydb set default_statistics_target=100;
>
> and run analyze again:
>
> analyze;
So, i removed the index on field_name, set
default_default_statistics_target to 100, analyzed, and the results are
the same:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=1.04..58676.31 rows=218048 width=68) (actual
time=0.067..12268.394 rows=6 loops=1)
Hash Cond: ((u.field_name)::text = (t.key)::text)
-> Seq Scan on photo_info_data u (cost=0.00..47500.30 rows=2398530
width=50) (actual time=0.013..6426.611 rows=2398446 loops=1)
-> Hash (cost=1.02..1.02 rows=2 width=18) (actual
time=0.015..0.015 rows=2 loops=1)
-> Seq Scan on t_query_data t (cost=0.00..1.02 rows=2
width=18) (actual time=0.002..0.006 rows=2 loops=1)
Total runtime: 12268.459 ms
(6 rows)
I even changed default_statistics_target to 1000:
------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=1.04..58580.29 rows=208561 width=67) (actual
time=0.054..12434.283 rows=6 loops=1)
Hash Cond: ((u.field_name)::text = (t.key)::text)
-> Seq Scan on photo_info_data u (cost=0.00..47499.46 rows=2398446
width=49) (actual time=0.012..6129.923 rows=2398446 loops=1)
-> Hash (cost=1.02..1.02 rows=2 width=18) (actual
time=0.015..0.015 rows=2 loops=1)
-> Seq Scan on t_query_data t (cost=0.00..1.02 rows=2
width=18) (actual time=0.002..0.004 rows=2 loops=1)
Total runtime: 12434.338 ms
(6 rows)
Even when I run this query, I get sequential scan:
explain analyze select * from photo_info_data where field_name =
'f-spot' or field_name = 'shutter';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Seq Scan on photo_info_data (cost=0.00..59491.69 rows=1705 width=49)
(actual time=0.018..1535.963 rows=6 loops=1)
Filter: (((field_name)::text = 'f-spot'::text) OR
((field_name)::text = 'shutter'::text))
Total runtime: 1536.010 ms
(3 rows)
These are the representations of te values 'f-spot' and 'shutter' for
the field field_name in photo_info_data table:
xmltest=# select field_name, count(*) from user_info_data where
field_name in ('visina', 'spol') group by field_name;
field_name | count
------------+-------
'f-spot' | 3
'shutter' | 3
(2 rows)
Maybe my test-data is poor? As I've mentioned, photo_info_data has
little over 2300000 rows. And this is complete 'distribution' of the data:
xmltest=# select field_name, count(*) from user_info_data group by
field_name order by count(*) desc;
field_name | count
----------------+--------
field_Xx1 | 350000
field_Xx2 | 332447
field_Xx3 | 297414
field_Xx4 | 262394
field_Xx5 | 227396
field_Xx6 | 192547
field_Xx7 | 157612
field_Xx8 | 122543
field_Xx9 | 87442
field_Xx10 | 52296
field_1 | 50000
field_2 | 47389
field_3 | 42412
field_4 | 37390
field_5 | 32366
field_6 | 27238
field_7 | 22360
field_Xx11 | 17589
field_8 | 17412
field_9 | 12383
field_10 | 7386
field_11 | 2410
f-spot | 3
shutter | 3
focal | 3
flash | 3
m_city | 3
person | 3
iso | 2
(29 rows)
No matter what field_name value I enter in WHERE condition, planner
chooses sequential scan. Only when I add seperate index on field_name,
planner chooes index scan or bitmap index scan.
Mike
From | Date | Subject | |
---|---|---|---|
Next Message | Matthew Wakeling | 2009-04-06 12:24:48 | Re: difficulties with time based queries |
Previous Message | Matthew Wakeling | 2009-04-06 11:47:40 | Re: plpgsql arrays |