Re: Setting Statistics on Functional Indexes

From: Shaun Thomas <sthomas(at)optionshouse(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Setting Statistics on Functional Indexes
Date: 2012-10-29 15:26:57
Message-ID: 508EA041.6090101@optionshouse.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 10/26/2012 04:08 PM, Tom Lane wrote:

> So the bottom line is that this is a case where you need a lot of
> resolution in the histogram. I'm not sure there's anything good
> we can do to avoid that.

I kinda hoped it wouldn't be something like that. For the particularly
painful instance, it was easy to replace the index with a better (if
larger) dual index and drop the bad old one. But in some cases, I'm
having to maintain two indexes that make me sad:

CREATE TABLE activity (
activity_id SERIAL NOT NULL PRIMARY KEY,
account_id BIGINT NOT NULL,
action_date TIMESTAMP WITHOUT TIME ZONE
);

CREATE INDEX idx_activity_action_date_account_id
ON activity (action_date, activity_id);

CREATE INDEX idx_activity_account_id_action_date
ON activity (activity_id, action_date);

Because in the first case, we needed the action_date to be first for
analytics that *don't* supply account_id. But in the second case, we
need the account_id first, so we can get the most recent action(s) for
that account without a very expensive backwards index scan on the first
index.

I know that current_date seems like an edge case, but I can't see how
getting the most recent activity for something is an uncommon activity.
Tip tracking is actually the most frequent pattern in the systems I've
seen. Admittedly, those are almost always high TPS trading systems.

At this point, I'm almost willing to start putting in optimization
fences to force it along the right path. Which is gross, because that's
effectively no better than Oracle hints. But I also don't like setting
my statistics to 5000+ on problematic column/index combos to get the
right heuristics, or having semi-duplicate multi-column indexes to
exploit sorting performance.

I mean, I get it. I just wonder if this particular tweak isn't more of a
regression than initially thought.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas(at)optionshouse(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Shaun Thomas 2012-10-29 16:17:15 Re: Tons of free RAM. Can't make it go away.
Previous Message Albe Laurenz 2012-10-29 15:04:00 Re: Replaying 48 WAL files takes 80 minutes