Re: Why is indexonlyscan so darned slow?

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-21 21:29:32
Message-ID: CAHyXU0xT=Se_zMdjEaC9W6uGuuP7LEOLKpJ7CRwrH8z+k4SQ-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 21, 2012 at 4:17 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Mon, May 21, 2012 at 1:42 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>>
>>> Earlier you said that this should be an ideal setup for IOS.  But it
>>> isn't really--the ideal set up is one in which the alternative to an
>>> IOS is a regular index scan which makes many uncached scattered reads
>>> into the heap.  I don't think that that situation can't really be
>>> engineered with a where-less query.
>>
>> Can you give me some suggested comparisons which *would* be ideal, then?
>
> Are you looking for vaguely real-life examples, or highly contrived
> examples used to dissect the server?
>
> For vaguely real life, take your example of pgbench -i -s200 -F 50,
> and I have 2Gig RAM, which seems to be the same as you do.
>
> With select only work load (pgbench -S -M prepared -T 30), I get
>
> tps = 193
>
> But now enable index-only scans:
>
> psql -c "create index on pgbench_accounts(aid, abalance);"
>
> and it goes up to.
>
> tps = 10137

Right -- the main driver here is that your index fits neatly in ram
and the heap does not -- so you're effectively measuring the
difference between a buffered and non-buffered access. That's highly
contrived as you noted and unlikely to come up all *that* often in the
real world.

Generally though the real world wins (although the gains will be
generally less spectacular) are heavily i/o bound queries where the
indexed subset of data you want is nicely packed and the (non
clustered) heap records are all over the place. By skipping the semi
random heap lookups you can see enormous speedups. I figure 50-90%
improvement would be the norm there, but this is against queries that
are taking forever, being i/o bound.

See here: http://www.devheads.net/database/postgresql/performance/index-all-necessary-columns-postgres-vs-mssql.htm
for a 'in the wild' gripe about about not having index scans.

merlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Pflug 2012-05-21 22:08:51 Re: read() returns ERANGE in Mac OS X
Previous Message Jeff Janes 2012-05-21 21:17:26 Re: Why is indexonlyscan so darned slow?