From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PERFORM] Slow query: bitmap scan troubles |
Date: | 2013-01-07 00:03:04 |
Message-ID: | CA+U5nMKvRhKviNYagHj+X-A__5uFFJEEGzBJe=f40ho2DbVk-Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
On 6 January 2013 23:03, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> On 6 January 2013 18:58, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> IIRC, one of my very first attempts to deal with this was to charge
>>> random_page_cost per level of index descended. This was such a horrid
>>> overestimate that it never went anywhere. I think that reflects that in
>>> practical applications, the upper levels of the index tend to stay in
>>> cache. We could ignore I/O on that assumption and still try to model
>>> CPU costs of the descent, which is basically what Jeff is proposing.
>>> My objection to his formula is mainly that it ignores physical index
>>> size, which I think is important to include somehow for the reasons
>>> I explained in my other message.
>
>> Having a well principled approach will help bring us towards a
>> realistic estimate.
>
> I thought about this some more and came up with what might be a
> reasonably principled compromise. Assume that we know there are N
> leaf entries in the index (from VACUUM stats) and that we know the
> root page height is H (from looking at the btree metapage). (Note:
> H starts at zero for a single-page index.) If we assume that the
> number of tuples per page, P, is more or less uniform across leaf
> and upper pages (ie P is the fanout for upper pages), then we have
> N/P = number of leaf pages
> N/P/P = number of level 1 pages
> N/P^3 = number of level 2 pages
> N/P^(h+1) = number of level h pages
> Solving for the minimum P that makes N/P^(H+1) <= 1, we get
> P = ceil(exp(ln(N)/(H+1)))
> as an estimate of P given the known N and H values.
>
> Now, if we consider only CPU costs of index descent, we expect
> about log2(P) comparisons to be needed on each of the H upper pages
> to be descended through, that is we have total descent cost
> cpu_index_tuple_cost * H * log2(P)
>
> If we ignore the ceil() step as being a second-order correction, this
> can be simplified to
>
> cpu_index_tuple_cost * H * log2(N)/(H+1)
>
> I propose this, rather than Jeff's formula of cpu_index_tuple_cost *
> log2(N), as our fudge factor. The reason I like this better is that
> the additional factor of H/(H+1) provides the correction I want for
> bloated indexes: if an index is bloated, the way that reflects into
> the cost of any particular search is that the number of pages to be
> descended through is larger than otherwise. The correction is fairly
> small, particularly for large indexes, but that seems to be what's
> expected given the rest of our discussion.
Seems good to have something with both N and H in it. This cost model
favours smaller indexes over larger ones, whether that be because
they're partial and so have smaller N, or whether the key values are
thinner and so have lower H.
> We could further extend this by adding some I/O charge when the index is
> sufficiently large, as per Simon's comments, but frankly I think that's
> unnecessary. Unless the fan-out factor is really awful, practical-sized
> indexes probably have all their upper pages in memory. What's more, per
> my earlier comment, when you start to think about tables so huge that
> that's not true it really doesn't matter if we charge another
> random_page_cost or two for an indexscan --- it'll still be peanuts
> compared to the seqscan alternative.
Considering that we're trying to decide between various indexes on one
table, we don't have enough information to say which index the cache
favours and the other aspects of cacheing are the same for all indexes
of any given size. So we can assume those effects cancel out for
comparison purposes, even if they're non-zero. And as you say, they're
negligible in comparison with bitmapindexscans etc..
The only time I'd question that would be in the case of a nested loops
join but that's not important here.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Fabrízio de Royes Mello | 2013-01-07 00:54:48 | Re: Re: Proposal: Store "timestamptz" of database creation on "pg_database" |
Previous Message | Tom Lane | 2013-01-06 23:17:23 | Re: [PERFORM] Slow query: bitmap scan troubles |
From | Date | Subject | |
---|---|---|---|
Next Message | Patrick Dung | 2013-01-07 17:18:02 | Re: Sub optimal performance with default setting of Postgresql with FreeBSD 9.1 on ZFS |
Previous Message | Tom Lane | 2013-01-06 23:17:23 | Re: [PERFORM] Slow query: bitmap scan troubles |