From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov> |
Cc: | bruce(at)momjian(dot)us, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: COUNT(*) and index-only scans |
Date: | 2011-10-15 22:58:49 |
Message-ID: | CAMkU=1xeG9M4cYgoMTGrY1GCF==JKn4mm+D-oTT4u-n=MDKdcA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Oct 10, 2011 at 9:48 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>> Jeff Janes wrote:
>> Kevin Grittner wrote:
>
>>> create table t (id int not null primary key);
>>> insert into t select generate_series(1, 1000000);
>>> vacuum freeze analyze;
>>> explain analyze select count(*) from t
>>> where id between 500000 and 500010;
>>>
>>> That gives you an index-only scan; but without the WHERE clause it
>>> uses a seq scan.
>>
>> If you convert the where clause to "where id is not null" it uses
>> the index only scan again, but only if you nudge it too with
>> enable_seqscan=off.
With a recent commit from (I assume) Tom, the "where id is not null"
is no longer needed.
> Clever way to get a full-table test.
>
> It turns out that for the above, with your trick to use the index
> only scan, it comes out 12% faster to do a seqscan, even when the
> table and index are fully cached (based on the average time of ten
> runs each way). There's very little overlap, so the difference looks
> real. But that's on a very narrow record, having just the one column
> used in the index. I added one wide column like this:
>
> alter table t add column x text;
> update t set x = (repeat(random()::text, (random() * 100)::int));
> cluster t USING t_pkey;
> vacuum freeze analyze;
>
> With that change the index-only scan time remained unchanged, while
> the seqscan time grew to about 2.6 times the index only scan time.
> That was mildly surprising for me, considering it was all still
> cached.
I used the pgbench_accounts table from pgbench -i -s 50, where all
data fits in shared_buffers, using the -f switch with either
set enable_seqscan=off;
select count(*) from pgbench_accounts;
or
set enable_indexonlyscan=off;
select count(*) from pgbench_accounts;
With just a single client, it was a toss-up. But with 8 concurrent
clients on a 8 CPU machine, the index-only scan was 50% faster. So
that is a nice win, even if well-designed apps probably shouldn't be
endlessly counting rows of an unchanging table using all available
CPUs in the first place.
Cheers,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Jan Urbański | 2011-10-15 23:28:56 | plpython SPI cursors |
Previous Message | Noah Misch | 2011-10-15 22:48:58 | Re: Pushing ScalarArrayOpExpr support into the btree index AM |