Quick Links

Re: index-only scans

From:	Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: index-only scans
Date:	2011-08-11 21:39:08
Message-ID:	CAF6yO=2MseNGb-B29SZN4+wtuVh5VZLheqpj2HfLndODaB2tXw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

2011/8/11 Robert Haas <robertmhaas(at)gmail(dot)com>:
> Please find attached a patch implementing a basic version of
> index-only scans. This patch is the work of my colleague Ibrar Ahmed
> and myself, and also incorporates some code from previous patches
> posted by Heikki Linnakanagas.

Great!.

>
> I'm able to demonstrate a significant performance benefit from this
> patch, even in a situation where it doesn't save any disk I/O, due to
> reduced thrashing of shared_buffers. Configuration settings:
>
> max_connections = 100
> shared_buffers = 400MB
> maintenance_work_mem = 256MB
> synchronous_commit = off
> checkpoint_segments = 100
> checkpoint_timeout = 30min
> checkpoint_completion_target = 0.9
> checkpoint_warning = 90s
> seq_page_cost = 1.0
> random_page_cost = 1.0
> effective_cache_size = 3GB
>
> Test setup:
>
> pgbench -i -s 50
> create table sample_data as select (random()*5000000)::int as aid,
> repeat('a', 1000) as filler from generate_series(1,100000);
>
> Test queries:
>
> select sum(aid) from sample_data a1 where exists (select * from
> pgbench_accounts a where a.aid = a1.aid and a.aid != 1234567);
> select sum(aid) from sample_data a1 where exists (select * from
> pgbench_accounts a where a.aid = a1.aid and a.bid != 1234567);
>
> On my laptop, the first query executes in about 555 ms, while the
> second one takes about 1125 ms. Inspection via pg_buffercache reveals
> that the second one thrashes shared_buffers something fierce, while
> the first one does not. You can actually get the time for the first
> query down to about 450 ms if you can persuade PostgreSQL to cache the
> entire sample_data table - which is difficult, due the
> BufferAccessStrategy stuff - and as soon as you run the second query,
> it blows the table out of cache, so practically speaking you're not
> going to get that faster time very often. I expect that you could get
> an even larger benefit from this type of query if you could avoid
> actual disk I/O, rather than just buffer cache thrashing, but I
> haven't come up with a suitable test cases for that yet (volunteers?).
>
> There are several things about this patch that probably need further
> thought and work, or at least some discussion.
>
> 1. The way that nodeIndexscan.c builds up the faux heap tuple is
> perhaps susceptible to improvement. I thought about building a
> virtual tuple, but then what do I do with an OID column, if I have
> one? Or maybe this should be done some other way altogether.

Can this faux heap tuple be appended by the data from another index
once it has been created ? ( if the query involves those 2 index)

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

In response to

index-only scans at 2011-08-11 20:06:08 from Robert Haas

Responses

Re: index-only scans at 2011-08-12 02:39:02 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Marc Munro	2011-08-11 22:45:59	Re: [HACKERS] Dropping extensions
Previous Message	Tom Lane	2011-08-11 21:09:05	Re: error: could not find pg_class tuple for index 2662