| From: | Shaun Thomas <sthomas(at)optionshouse(dot)com> | 
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
| Cc: | "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> | 
| Subject: | Re: Setting Statistics on Functional Indexes | 
| Date: | 2012-10-24 19:31:11 | 
| Message-ID: | 508841FF.9040303@optionshouse.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-performance | 
On 10/24/2012 02:11 PM, Tom Lane wrote:
> It's not particularly (not that you've even defined what you think
> "optimistic" is, much less mentioned what baseline you're comparing
> to).
The main flaw with my example is that it's random. But I swear I'm not 
making it up! :)
There seems to be a particularly nasty edge case we're triggering, then. 
Like I said, it's worse when col1+col2 don't match anything. In that 
case, it's using the trunc index on the date column, which has 
demonstrably worse performance. Here are the two analyzes I got 
before/after front-loading statistics.
Before stats increase:
  Sort  (cost=9.38..9.39 rows=1 width=23) (actual time=78.282..78.282 
rows=0 loops=1)
    Sort Key: id, action_date
    Sort Method: quicksort  Memory: 25kB
    ->  Index Scan using idx_date_test_action_date_trunc on date_test 
(cost=0.00..9.37 rows=1 width=23) (actual time=78.274..78.274 rows=0 
loops=1)
          Index Cond: (date_trunc('day'::text, action_date) >= 
'2012-10-24 00:00:00'::timestamp without time zone)
          Filter: (((col1)::text = 'S:96'::text) AND (col2 = 657::numeric))
  Total runtime: 78.317 ms
And then after. I used your unofficial trick to set it to 1000:
alter index idx_date_test_action_date_trunc
       alter column date_trunc set statistics 1000;
analyze date_test;
  Sort  (cost=9.83..9.83 rows=1 width=23) (actual time=0.038..0.038 
rows=0 loops=1)
    Sort Key: id, action_date
    Sort Method: quicksort  Memory: 25kB
    ->  Index Scan using idx_date_test_col1_col2 on date_test 
(cost=0.00..9.82 rows=1 width=23) (actual time=0.030..0.030 rows=0 loops=1)
          Index Cond: (((col1)::text = 'S:96'::text) AND (col2 = 
657::numeric))
          Filter: (date_trunc('day'::text, action_date) >= '2012-10-24 
00:00:00'::timestamp without time zone)
  Total runtime: 0.066 ms
This is on a bone-stock PG 9.1.6 from Ubuntu 12.04 LTS, with 
default_statistics increased to 500. The only thing I bumped up was the 
functional index between those two query plans.
But then I noticed something else. I reverted back to the old 500 
default for everything, and added an index:
create index idx_date_test_action_date_trunc_col1
     on date_test (date_trunc('day', action_date), col1);
I think we can agree that this index would be more selective than the 
one on date_trunc by itself. Yet:
  Sort  (cost=9.38..9.39 rows=1 width=23) (actual time=77.055..77.055 
rows=0 loops=1)
    Sort Key: id, action_date
    Sort Method: quicksort  Memory: 25kB
    ->  Index Scan using idx_date_test_action_date_trunc on date_test 
(cost=0.00..9.37 rows=1 width=23) (actual time=77.046..77.046 rows=0 
loops=1)
          Index Cond: (date_trunc('day'::text, action_date) >= 
'2012-10-24 00:00:00'::timestamp without time zone)
          Filter: (((col1)::text = 'S:96'::text) AND (col2 = 657::numeric))
  Total runtime: 77.091 ms
All I have to say about that is: wat.
-- 
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas(at)optionshouse(dot)com
______________________________________________
See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Shaun Thomas | 2012-10-24 19:54:52 | Re: Setting Statistics on Functional Indexes | 
| Previous Message | Tom Lane | 2012-10-24 19:11:48 | Re: Setting Statistics on Functional Indexes |