Quick Links

Re: Setting Statistics on Functional Indexes

From:	Shaun Thomas <sthomas(at)optionshouse(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Setting Statistics on Functional Indexes
Date:	2012-10-24 19:31:11
Message-ID:	508841FF.9040303@optionshouse.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On 10/24/2012 02:11 PM, Tom Lane wrote:

> It's not particularly (not that you've even defined what you think
> "optimistic" is, much less mentioned what baseline you're comparing
> to).

The main flaw with my example is that it's random. But I swear I'm not
making it up! :)

There seems to be a particularly nasty edge case we're triggering, then.
Like I said, it's worse when col1+col2 don't match anything. In that
case, it's using the trunc index on the date column, which has
demonstrably worse performance. Here are the two analyzes I got
before/after front-loading statistics.

Before stats increase:

Sort (cost=9.38..9.39 rows=1 width=23) (actual time=78.282..78.282
rows=0 loops=1)
Sort Key: id, action_date
Sort Method: quicksort Memory: 25kB
-> Index Scan using idx_date_test_action_date_trunc on date_test
(cost=0.00..9.37 rows=1 width=23) (actual time=78.274..78.274 rows=0
loops=1)
Index Cond: (date_trunc('day'::text, action_date) >=
'2012-10-24 00:00:00'::timestamp without time zone)
Filter: (((col1)::text = 'S:96'::text) AND (col2 = 657::numeric))
Total runtime: 78.317 ms

And then after. I used your unofficial trick to set it to 1000:

alter index idx_date_test_action_date_trunc
alter column date_trunc set statistics 1000;
analyze date_test;

Sort (cost=9.83..9.83 rows=1 width=23) (actual time=0.038..0.038
rows=0 loops=1)
Sort Key: id, action_date
Sort Method: quicksort Memory: 25kB
-> Index Scan using idx_date_test_col1_col2 on date_test
(cost=0.00..9.82 rows=1 width=23) (actual time=0.030..0.030 rows=0 loops=1)
Index Cond: (((col1)::text = 'S:96'::text) AND (col2 =
657::numeric))
Filter: (date_trunc('day'::text, action_date) >= '2012-10-24
00:00:00'::timestamp without time zone)
Total runtime: 0.066 ms

This is on a bone-stock PG 9.1.6 from Ubuntu 12.04 LTS, with
default_statistics increased to 500. The only thing I bumped up was the
functional index between those two query plans.

But then I noticed something else. I reverted back to the old 500
default for everything, and added an index:

create index idx_date_test_action_date_trunc_col1
on date_test (date_trunc('day', action_date), col1);

I think we can agree that this index would be more selective than the
one on date_trunc by itself. Yet:

Sort (cost=9.38..9.39 rows=1 width=23) (actual time=77.055..77.055
rows=0 loops=1)
Sort Key: id, action_date
Sort Method: quicksort Memory: 25kB
-> Index Scan using idx_date_test_action_date_trunc on date_test
(cost=0.00..9.37 rows=1 width=23) (actual time=77.046..77.046 rows=0
loops=1)
Index Cond: (date_trunc('day'::text, action_date) >=
'2012-10-24 00:00:00'::timestamp without time zone)
Filter: (((col1)::text = 'S:96'::text) AND (col2 = 657::numeric))
Total runtime: 77.091 ms

All I have to say about that is: wat.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas(at)optionshouse(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

In response to

Re: Setting Statistics on Functional Indexes at 2012-10-24 19:11:48 from Tom Lane

Responses

Re: Setting Statistics on Functional Indexes at 2012-10-24 19:54:52 from Shaun Thomas

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Shaun Thomas	2012-10-24 19:54:52	Re: Setting Statistics on Functional Indexes
Previous Message	Tom Lane	2012-10-24 19:11:48	Re: Setting Statistics on Functional Indexes