From: | "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: proposal: be smarter about i/o patterns in index scan |
Date: | 2004-05-19 19:56:14 |
Message-ID: | 1084996574.19249.8.camel@heat |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 2004-05-19 at 11:56, Tom Lane wrote:
> "Jeffrey W. Baker" <jwbaker(at)acm(dot)org> writes:
> > Are you saying that index scan results are sorted by something other
> > than the index key? Because in my scheme they would still be sorted by
> > index key.
>
> Not unless you add yet another sort step after the fetch step, that is
> the idea becomes
> 1. read index to get candidate TIDs
> 2. sort TIDs into physical order
> 3. read heap in physical order, check row visibility
> 4. sort selected rows into index order
>
> That would start to look like an unpleasant amount of overhead, since
> the sort would have to be over the entire output; you couldn't divide
> the scan into reasonable-size batches, whereas steps 1-3 can easily
> be run in batched style.
>
> Moreover, this'd not be any win compared to just dropping the assumption
> of sorted output; the planner could stick in the same sort for itself if
> it decided the cost was worth it.
>
> What you probably want to do is support both our traditional indexscan
> style and an "unsorted indexscan" plan type that delivers unsorted
> output (implementing steps 1-3 above). With appropriate cost estimation
> the planner could make an intelligent choice between the two.
> I'm too lazy to look in the archives right now, but my private notes
> about this say:
Thanks for the notes. Introducing a new query execution step sounds
like a better/easier idea than was I was going to do, which was to
rework some of the access methods to operate on vectors instead of
scalars. That idea looks increasingly difficult to implement.
> We could improve indexscan performance if we were to read a bunch of TIDs from
> the index, sort in page number order, and access the heap tuples in page
> number order. But probably need a batch size in the thousands of TIDs to get
> any really useful effect.
Depends on the query, but I got an improvement by half from batches of
400.
> This would not work very well given the present
> assumption that we hold a pin on the index page until we have fetched the
> tuple (cf lazy VACUUM). Pinning lots of index pages is no good for
> concurrency or buffer usage. QUESTION: is that assumption really necessary?
> Can't we just make the code ignore TIDs that lead to deleted tuples? Worst
> case is that we fetch a newly-inserted tuple that does not actually have the
> expected index value, but won't we ignore it because of MVCC tqual rules?
> (This does NOT work for dirty read or SnapshotNow cases, but we could probably
> set things up to only try this optimization for standard snapshot reads.)
> An even nicer property is that we could loop inside the index AM to scoop up
> tuples, saving locking overhead. This might be worth doing even when we want
> to keep returning the tuples in index order.
I think this is *definitely* worth doing. The current implementation in
the vicinity of index_getnext wastes valuable opportunities to optimize
the io and/or to let the kernel do a better job. Descending into
index_getnext -> heap_getnext for every tuple looks expensive in my
application.
Thanks again for your notes I'll revisit the source code and see if I
can come up with a plan.
-jwb
PS Regarding the archive, I've never seen it work. I use MARC instead.
From | Date | Subject | |
---|---|---|---|
Next Message | Joshua D. Drake | 2004-05-19 19:59:48 | Re: Call for 7.5 feature completion |
Previous Message | Tom Lane | 2004-05-19 19:55:17 | Re: proposal: be smarter about i/o patterns in index scan |